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(57) Abstract 



Disclosed is a nucleic acid sequence encoding a polypeptide having starch branching enryme (SBE) activity, the encoded polypeptide 
comprising an effective portion of the amino acid sequence shown in Figure 4 or Figure 1 3. 
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Title : Improvements in or Relating to Starch Content of Plants 

Field of the Invention 

This invention relates to novel nucleic acid sequences, vectors and hosi cells comprising 
the nucleic acid sequence(s). to polypeptides encoded thereby, and to a method of altering 
a host cell by introducing the nucleic acid sequence(s) of the invention. 

Background to the Invention 

Starch consists of two main polysaccharides, amylose and amylopectin. Amy lose is a 
linear polymer containing a-lA linked glucose units, while amylopectin is a highly 
branched polymer consisting of a a-lA linked glucan backbone with «-1.6 linked glucan 
branches. In most plant storage reserves amylopectin consitutes about 75% of the starch 
content. Amylopectin is synthesized by the concerted action of soluble starch synthase and 
starch branching enzyme [a-lA glucan: <r-L4 glucan 6-glycosyltransferase. EC 2.4. 1. IS]. 
Starch branching enzyme (SBE) hydrolyses «-L4 linkages and rejoins the cleaved glucan. 
via an cM.fi linkage, to an acceptor chain to produce a branched structure. The physical 
properties of starch are strongly affected by the relative abundance of amylose and 
amylopectin, and SBE is therefore a crucial enzyme in determining both the quantity and 
quality of starches produced in. plan; systems. 

Starches are commercially available from several plant sources including maize, potato and 
cassava. Each of these starches has unique physical characteristics and properties and a 
varietv of possible industrial uses. In maize there are a number of naturally occurring 
mutants which have altered starch composition such as high amylopectin types ("waxy" 
starches) or high amylose starches but in potato and cassava no such mutants exist on a 
commercial basis as yet. 
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Genetic modification otters the possibility of obtaining new starches which may have novel 
and potentially useful characteristics. Most of the work to date has involved potato plants 
because they are amenable to genetic manipulation i.e. they can be transformed using 
Ai»robacterium and regenerated easily from tissue culture. In addition many ot the genes 
involved in starch biosynthesis have been cloned from potato and thus are available as 
taruets for uenetie manipulation, for example, by antisense inhibition of expression or 
sense suppression. 

Cassava (Manihot esculenta L. Crantz) is an important crop in the tropics, where its 
starch-filled roots are used both as a food source and increasingly as a source ot starch. 
Cassava is a high yielding perennial crop thai can grow on poor soils and is also tolerant 
of drouuht. Cassava starch being a root-derived starch has properties similar but not 
identical to potato starch and is composed of 20-25'// amylose and 75-800? amylopectin 
(Rickard et uL % 1991. Trop. Sci. 31. i*X9-207). Some of the genes involved in starch 
biosynthesis have been cloned from cassava, including starch branching enzyme I (SBE 
I) (Salehuzzaman et <//., 1994 Plant Science 98, 53-62), and granule bound starch synthase 
I (GBSS I) (Salehuzzaman et uL. 1993 Plant Molecular Biology 23, 947-962) and some 
work has been done on their expression patterns although only in in vitro grown plants 
(Salehuzzaman et aL. 1994 Plant Science 98. 53-62). 

In most plants studied to date e.g. maize (Boyer Sl Preiss. 1978 Biochcm. Biophys. Res. 
Comm. 8<h 169-175), rice (Smyth. 1988 Plant Sci. 57. 1-8) and pea (Smith, Plama 775, 
270-279). two forms of SBE have been identified, each encoded by a separate gene. A 
recent review by Burton et at.. ( 1995 The Plant Journal 7. 3-15) has demonstrated that the 
two forms of SBE constitute distinct classes of the enzyme such that, in general, enzymes 
of the same class from different plants may exhibit greater similarity than enzymes ot 
different classes from the same plant. In their review. Burton et uL termed the two 
respective enzyme families class "A M and class "B" . and the reader is referred thereto (and 
to the references cited therein) for a detailed discussion of the distinctions between the two 
classes. One general distinction of note would appear to be the presence, in class A SBE 
molecules, of a flexible N-ierminal domain, which is not found in class B molecules. The 
distinctions noted by Burton et al. are relied on herein to define class A and class B SBE 
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molecules, which terms are to be interpreted accordingly. 
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Manv oruanisations have interests in obtaining modified Cassava starches by means of 
genetic modification. This is impossible to achieve however, unless the plant is amenable 
to transformation and regeneration, and the starch biosynthesis genes which arc to be 
targeted tor modification must be cloned. The production of transgenic cassava plants has 
only recently been demonstrated (Taylor et at,, 1996 Nature Biotechnology 14. 726-730; 
Schopke ct a/.. 1996 Nature Biotechnology 14, 731-735; and Li ct <//.. 1996 Nature 
Biotechnology 14. 736-740), The present invention concerns the identification, cloning 
and sequencing of a starch biusynthetie gene from Cassava, suitable as a target for genetic 
manipulation. 

Summary of the Invention 

In a first aspect the invention provides a nucleic acid sequence encoding a polypeptide 
having starch branching enzyme (SBE) activity, the polypeptide comprising an effective 
portion of the amino acid sequences shown in Figure 4 or Figure 13. The nucleic acid 
is conveniently in substantial isolation, especially in isolation from other naturally 
associated nucleic acid sequences. 

An "effective portion" of the amino acid sequences may be defined. as a portion which 
retains sufficient SBE activity when expressed in E. coli KV832 to complement the 
branching enzyme mutation therein. The amino acid sequences shown in Figures 4 and 
13 include the N terminal transit peptide, which comprises about the first 50 amino a.:id 
residues. As those skiUed in the art will be well avare, such a transit peptide is not 
essential for SBE activity. Thus the mature polypeptide, lacking a transit peptide, may 
be considered as one example of an effective portion of the amino acid sequence shown 
in Figure 4 or Figure 13. 

Other effective portions may be obtained by effecting minor deletions in the amino acid 
sequence, whilst substantially preserving SBE activity. Comparison with known class A 
SBE sequences, with the benefit ot the disclosure herein, will enable those skilled in the 
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art-to identify regions of the polypeptide which are less well conserved and so amenable 
to minor deletion, or amino acid substitution (particularly, conservative amino acid 
substitution) whilst substantially preserving SBE activity. Such less well-conserved 
regions are generally found. in the N terminal amino acid residues (up to the triple proline 
"elbow" at residues 138-140 in Figure 4 and up lo the proline elbow at residues 143-145 
in Figure 13) and in the last 50 residues or so of the C terminal, and in particular in the 
acidic tail of the C terminal. 

Conveniently the nucleic acid sequence is obtainable from cassava, preferably obtained 
therefrom, and typically encodes a polypeptide obtainable from cassava. In a particular 
embodiment, the encoded polypeptide may have the amino acid sequence NSKH at about 
position 697 (in relation to Figure 4), which sequence appears peculiar to an isoform of 
the SBE class A enzyme of cassava, other class A SBE enzymes having the conserved 
sequence DA D/E Y (Burton et a/., 1995 cited above). 

In a particular aspect of the invention there is provided a nucleic acid comprising a portion 
of nucleotides 2 1 to 2531 of the nucleic acid sequence shown in Figure 4, or a functionally 
equivalent nucleic acid sequence. Such functionally equivalent nucleic acid sequences 
include, but are not limited to, those sequences which encode substantially the same amino 
acid sequence but which differ in nucleotide sequence from that shown in Figure 4 by 
virtue of the degeneracy of the genetic code. For example, a nucletc acid sequence may 
be altered (e.g. "codon optimised") for expression in a host other than cassava, such that 
the nucleotide sequence differs substantially whilst the amino acid sequence of the encoded 
polypeptide is unchanged. Other functionally equivalent nucleic acid sequences are those 
which will hybridise under stringent hybridisation conditions (e.g. as described by 
Sambrook et al. % Molecular Cloning. A Laboratory Manual, CSH. i.e. washing with 
O.lxSSC. 0.5% SDS at 68°C) with the sequence shown in Figure 4. Figure 10 shows a 
functionally equivalent sequence designated "125 -r 94", which includes a region 
corresponding to the 3* coding portion of the sequence in Figure 4. Figure 13 shows a 
functionally equivalent sequence which comprises a second complete SBE coding sequence 
(the SBE-derived sequence is from nucleotides 35 to 2760. of which the coding sequence 
is nucleotides 131-2677, the rest of the sequence in the figure is vector-derived). 
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Functionally equivalent DN A sequences will preferably comprise at least 200-300bp. more 
pre'ferablv 300-600bp. and will exhibit at least S8'^ identity (more preferably at lexst 909; . 
and most preferably at least 95'? identity) with the corresponding region of the DNA 
sequence shown in figures 4 or 10. Those skilled in the art will readily be able to conduct 
a sequence alignment between the putative functionally equivalent sequence and those 
detailed in Figures 4 or 10 - the identity of the two sequences is to be compared in those 
regions which are aligned by standard computer software, which aligns corresponding 
regions of the sequences. 

In particular embodiments the nucleic acid sequence may alternatively comprise a 5* 
and/or a 3" untranslated region ; "UTR"). examples of which are shown in Figures 2 and 
4. Figure 9 includes a 3' UTR. as nucleotides 688-1044 and Figure 10 inciudes 3' UTR 
as nucleotides 1507-1900 (which nucleotides correspond to the first base after the "stop" 
codon to the base immediately preceding the poly (A) tail). Any one of the sequences 
defined above, or a functional equivalent thereof (as defined by hybridisation properties, 
as set out in the preceding paragraph), could be useful in sense or anti-sense inhibition of 
corresponding genes, as will be apparent to those skilled in the art. It will also be 
apparent to those skilled in the art that such regions may be modified so as to optimise 
expression in a particular type of host cell and that the 5" and/or 3' UTRs could be used 
in isolation, or in combination with a coding portion of the sequence of the invention. 
Similarly, a coding portion could be used without a 5" or a 3" UTR it desired. 

In a further aspect, the invention provides a replicable nucleic acid construct comprising 
any one of the nucleic acid sequences defined above. The construct will typically 
comprise a selectable marker and may allow for expression of the nucleic acid sequence 
of the invention. Conveniently the vector will comprise a promoter (especially a promoter 
sequence operable in a plant and or a promoter operable in a bacterial cell) and one or 
more regulatory signals known lo those skilled in the art. 

In. another aspect the invention provides a polypeptide having SBE activity, the polypeptide 
comprising an effective portion of the amino acid sequence shown in Figure 4 or Figure 
13. The polypeptide is conveniently one obtainable from cassava, although it may be 
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derived using recombinant DNA techniques. The polypeptide is preferably in substantial 
isolation from other polypeptides of plant origin, and more preferably in substantial 
isolation from any other polypeptides. The polypeptide may have amino acid residues 
NSKH at about position 697 (in the sequence shown in Figure 4). instead of the sequence 
DA D/E Y found in other SBE class A polypeptides. The polypeptide may be used in a 
method of modifying starch in vitro, the method comprising treating starch under suitable 
conditions (of temperature. pH etc.) with an effective amount of the polypeptide. 

Those skilled in the art will appreciate that the disclosure of the present specification can 
be utilised in a number of ways. In particular, the characteristics of a host cell may be 
altered by recombinant DNA techniques. Thus, in a further aspect, there is provided a 
method by which a host cell may be altered by introduction of a nucleic acid sequence 
comprising at least 200bp and exhibiting at least 88'/£ sequence identity (more preferably 
at least 90'/? . and most preferably at least 95 % identity) with the corresponding region of 
the DNA sequence shown in Figures 4, 9. 10 or 13. operabiy linked in the sense or 
(preferably) in the anti-sense orientation to a suitable promoter active in the host cell, and 
causing transcription of the introduced nucleic acid sequence, said transcript and/or the 
translation product thereof being sufficient to interfere with the expression of a 
homologous gene naturally present in said host cell, which homologous gene encodes a 
polypeptide having SBE activity. The altered host cell is typically a plant cell, such as a 
cell of a cassava, banana, potato, sweel potato, tomato, pea. wheat, barley, oat. maize, 
or rice plant. 

Desirably the method further comprises the introduction of one or more nucleic acid 
sequences which are effective in interfering with the expression of other homologous gene 
or genes naturally present in the host cell. Such other genes whose expression is inhibited 
may be involved in starch biosynthesis (e.g. an SBE I gene), or may be unre!a:ed to SBE 
II. 

Those skilled in the art will be aware that both anti-sense inhibition, and "sense 
suppression" of expression of genes, especially plant genes, has been demonstrated (e.g. 
Matzkc & Matzkc 1995 Plant Physiol. J07. 679-685). 
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It is believed that anliscnsc methods are mainly operable by the production oi anliscnsc 
mRNA which hybridises to the sense mRNA. preventing its translation into functional 
polypeptide, possibly by causing ihe hybrid RNA to be degraded (e.g. Sheehy ct u/.. 198-S 
PNAS #5, Van der Kroi a a/.. Mol. Gen. Genet. 220. 204-212). Sense 

suppression also requires homology between the introduced sequence and the target gene, 
hut the .exact mechanism is unclear. It is apparent however that, in relation to both 
antisense and sense suppression, neither a full length nucleotide sequence, nor a " native" 
sequence is essential. Preferably the nucleic acid sequence used in the method will 
comprise at least 2()0-3(M)bp. more preferably at least 300-6(X)hp. ot the fu!! length 
sequence, but by simple trial and error other fragments (smaller or larger) may be found 
which are functional in altering the characteristics or the plant. It is also known that 
untranslated portions of sequence can suffice to inhibit expression of the homologous gene 
- coding portions may be present within the introduced sequence, but they do not appear 
to be essential under all circumstances. 

The inventors have discovered that there are at least two class A SBE genes in cassava. 
A fragment ot" a second gene has been isolated, which fragment directs the expression of 
the C terminal 48] amino acids of cassava class A SBE (see Figure 10) and comprises a 
3* untranslated region. Subsequently, a complete clone of the second gene was also 
recovered (see Figure 12). The coding portions ot the two genes show some slight 
differences, and the second SBE gene may be considered as functionally equivalent to the 
corresponding portion of the nucleotide sequence shown in Figure 4. However, the 3* 
untranslated regions of the two genes show marked differences. Thus the method of 
altering a host cell may comprise the use of a sufficient portion of either gone so as to 
inhibit the expression of the naturally occurring homologous gene. Conveniently, a 
portion of nucleotide sequence is employed which is conserved between both genes. 
Alternatively, sufficient portions of both genes may be employed, typically using a single 
construct to direct the transcription of both introduced sequences. 

ir ; addition, as explained above, it may be desired to cause inhibition of expression ot the 
ci.:ss B SBE 'i.e. S3r. h in the same host ce!i. A number of class B SBE gene sequences 
.;,v.»wn. :nc: of the cassava class B SBE < Salehur/aman ct at.. 1W 
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Plant -Scieno: W. 53-62) and any one of those may prove suitable Preferably the 
sequence used is thai which derives from the host cell sought to be altered (e.g. when 
"altering the characteristics of a cassava plant cell, it is generally preferred to use sense or 
anti-sense sequences corresponding exactly to at least portions of the cassava gene whose 
expression is sought to be inhibited). 

In a further aspect the invention provides an altered host cell, into which has been 
introduced a nucleic acid sequence comprising at least 200bp and exhibiting at least 88'/? 
sequence identity (more preferably at least , and most preferably at least <)5'* identity) 
with the corresponding region of the DNA sequence shown in Figures 4. 9, 10 or 1j>. 
operably linked in the sense or anti-sense orientation to a suitable promoter, said host cell 
comprising a natunr gene sharing sequence homology with the introduced sequence. 

The host cell may be a micro-organism (such as a bacterial, fungal or yeast cell) or a plant 
cell. Conveniently the host cell altered by the method is a cell of a cassava plant, or 
another plant with starch storage reserves, such as banana, potato, sweet potato, tomato, 
pea. wheat, barley, oat. maize, or rice plant. Typically the sequence will be introduced 
in a nucleic acid construct, by way of transformation, transduction, micro-injection or 
other method known to those skilled in the art. The invention also provides for a plant 
into which has been introduced a nucleic acid sequence of the invention, or the progeny 
of such a plant. 

The altered plant cell will preferably be grown into an altered plant, using techniques oi 
plant growth and cultivation well-known u, those skilled in the art of re-generating 
planllets from plant cells. 

The invention also provides a method of obtaining starch from an altered phn;. me plant 
being obtained by the method defined above. Starch may be extracted from the plant by 
any of the known techniques (e.g. milling). The invention further provides starch 
obtainable from a plant altered by the method defined above, the starch having altered 
properties compared to starch extracted from an equivalent but unaltered plant- 
Conveniently the altered starch is obtained from an altered plant selected from the group 
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consisting of cassava, potato, pea. tomato, maize, wheat, barley, oat. sweet potato and 
riee. Typicallv the altered starch will have increased amylose content. 

The invention will now be further described by way of illustrative examples and with 
reference to the accompanying drawings, in which:- 

Figure 1 is a schematic illustration ot* the cloning strategy tor cassava SBE II. The lop line 
represents the size ot a full length clone with distances in kilobases (kb) and arrows 
representing oliuonucleotides (rightward pointing arrows are sense strand, leftward are on 
opposite strand). The long thick arrow is the open reading frame with start and stop 
codons shown. Below this are shown the 3' RACE. 5* RACE and PCR clones identified 
either by the plasmid name (shown in brackets above the line) or the clone number (shown 
to the left of the clone) for the 5* RACE oniy. Also shown (by an xj in the 5" RACE 
clones are positions of small deletions or introns. 

Figure 2 shows the DNA sequence and predicted ORF of csbe2con.seq. This sequence 
is a consensus ot 3* RACE pSJ94 and 5* RACE clones 27/9. 1 1 and 28. The first 64 base 
pairs are derived from the RoRidTIT adaptor primer/dT tail followed by the SBE 
sequence. The one long open reading frame is shown in one letter code below the double 
strand DNA sequence. Also shown is the upstream ORF (MQI — LPW>. 

Figure 3 shows an alignment of the 5' region of cassava SBE II csbe2con and pSJ9V 
(clones 20 and 35) DNA sequences. Differences from the consensus sequence are shaded. 

Figure 4 shows the DNA sequence and predicted ORF ot full length cassava SBE II tuber 
cDNA in pSJIOT. The sequence shown is from the CSBE21-^ to the CSBE218 
oligonucleotide. The DNA sequence is sequence !D No. 28 in the attached sequence 
listing: the amino acid sequence is Seq ID No. 2 ( >\ 

Fieure 5 shows an alignment oi 3' region of cassava SBE II pSJllb and l2."> + y4 DNA 
sequences. The top line is 125 94 sequence and the bottom SJllb sequence. 
Identical nucleotides are indicated by the same letter in the middle line, differences are 
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indicated by a nap. and dashed lines indicate gaps introduced to nptimtse alignment. 

Figure 6 shows an alignment of carboxy terminal region ot pSJllfi and 125 + 94 protein 
sequences. The top sequence is from 125-^94 and the bottom from pSJllfv Identical 
amino acid residues are shown with the same letter, conserved changes with a colon and 
neutral changes with a period. 

Fieure 7 shows a phylogenetic tree ot staich branching enzyme proteins. The length ot 
each pair ot branches represents the distance between sequence pairs. The scale beneath 
the tree measures the distance between sequences (units indicate the number of substitution 
events). Dotted lines indicate a negative branch length because ot averaging the tree. 
Zmconl2.pro is maize SBE II. psstbi.pro is pea SBE I f Bhattacharyya ct al 1990 Cell 60. 
115-121) and atsbe2-l & 2-2. pro are two SBE II proteins from Arabidopsis thatania 
(Fisher ct al 1996 Plant Mol. Biol. 30, 97-108). SJi07.pro is representative of a cassava 
SBE II sequence, and potsbe2.pro is a potato SBE II sequence known to the inventors. 

Figure 8 is an alignment of SBE II proteins. Protein sequences are indicated in one letter 
code. The top line represents the consensus sequence, below which is shown the 
consensus ruler and the individual SBE II sequences. Residues matching the consensus 
are shaded. Dashes represent gaps introduced to optimise alignment- Sequence identities 
are shown at the right of the figure and are as Figure ~. except that SJiOT.pro is cassava 
SBE II. 

Fiaure 9 shows the DNA sequence and predicted OF F of a cassava SBE II cDN A isolated 
by 3' RACE (plasmid pSJ 101). 

Figure 10 shows the consensus DNA sequence and predicted ORF of a second cassava 
SBE II cDNA isolated by 5* and 5* RACE (sequence designated 125 is from plasmid 
pSJ125 and pSJ94. spiicec . the CSBE2I7 oligo sequence). 

Fieure 1 1 is a schematic diagram of the plant t; reformation vector pSJW The black line 
represents the DNA scuuence. The hashed line represents the bacterial pixsmid backbone 
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(containing the origin of replication and bacteria! selection marker) and is not shown in 
full. The tilled triangles represent the T-DNA borders (RB = right border. LB = left 
border). Relevant restriction enzyme sites are shown above the black line with the 
approximate distances (in kiloobases) betwen sites marked by ;in asterisk shown 
underneath. The thinnest arrows represent polyadenyiatiun signals (pAnos = nopaline 
synthase, pAg7 = Agrobacterium gene 7), the intermediate arrows represent protein 
coding regions (SBE II = cassava SBE II. HYG = hygromycin resistance gene) and the 
thick arrows represent promoter regions (P-2x35S = double CaMV 35S promoter. P-nos 
= nopaline synthase promoter) 

Figure 12 is a schematic illustration of the cloning strategy used to isolate a second 
cassava SBE II gene. The top line represents the size or a lull length clone with distances 
in kilobases (kb) and arrows representing oligonucleotides (rightward pointing arrows are 
sense strand, leftward are on opposite strand). The long thick arrow is the open reading 
frame with start and stop codons shown. Below this are shown the 3* RACE. 5* RACE 
and PCR clones identified either by the plasmid name (shown in brackets above the line) 
or the clone number (shown to the right of the clone). 

Figure 13 shows the DNA sequence and predicted ORF ot a second full length cassava 
SBE II tuber cDNA in pSJ146. Nucleotides 35-2760 are SBE II sequence and the 
remainder are from the pT7Blue vector. The DNA sequence of Figure 13 is Scq ID No. 
30, and the amino acid sequence is Seq ID No. 31. in the attached sequence listing. 

Example 1 

This example relates to the isolation and cloning ot SBE II sequences from cassava. 
Recombinant DNA manipulations 

Standard procedures were performed essentially according to Sambrook c: al. (19.XSJ 
Molecular cloning A laboratory manual. 2nd edn. Cold Spring Harbor Laboratory Press 
Cold Spring Harbor. N.Y.). DNA sequencing was performed on an ABl automated DNA 
sequencer and sequences manipulated using DNASTAR software for the Macintosh. 
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Rapid Amplication of i'DNA ends (RACF.i .inJ PCR conditions 

5* and 3* RACE were performed essentially according to Frohman ci uL, ( !9S8 Proc. 
Natl. Acad. Sci. USA 85. X99N-9002) but with the following modifications. 

For 3* RACE. 5 :*c of iota! RNA was reverse transcribed using 5 pmol of the RACE 
adaptor RoRidT 17 as primer and Stratascript RNAse H- reverse transcriptase (50 U) in 
a 50 u\ reaction according to the manufacturer \s instructions (Stratagene). The reaction 
was incubated for 1 hour at 3~"C and then diluted to 200 //I with TE (10 m.M Tris HCI, 
1 mM EDTA) pH 8 and stored at 4 D C. 2.5 u\ of this cDNA was used in a 25 ul PCR 
reaction with 12.5 pmol of SBE A and Ro primers for 30 cycles of 94 C C 45 sec. 50 3 C 
25 sec. 72 n C 1 min 30 sec. A second round of PCR (25 cycles) was performed using I 
ul of this reaction as template in a 50 u\ reaction under the same conditions. .Amplified 
products were separated by agarose gel electrophoresis and cloned into the pTTBlue vector 
(Invitrogen). 

For the first round of 5* RACE. 5 «g of total leaf RNA was reverse transcribed as 
described above using 10 pmol of the SBE II gene specific primer CSBE22. This primer 
was removed from the reaction by diluting to 500 ul with TE and centnfuging twice 
through a centricon 100 microcor.centrator. The concentrated cDNA was then dA-tailed 
with 91; of terminal deoxynucleotide transferase and 50 «M dATP in a 20 ul reaction in 
buffer supplied by the manufacturer (BRL). The reaction was incubated for 10 min at 
37 C C and 5 min at 65°C and then diluted to 200 «i with TE pH S. PCR was performed 
in a 50 «1 volume using 5/*l or tailed cDNA. 2.5 pmo! of RoRidTIT and 25 pmol of Ro 
and CSBE24 primers for 30 cycles of 94°C 45 sec, 55 C C 25 sec. 72°C 3 mm. Amplified 
products were separated on a ! r * TAE agarose gel. cut out. 2(X)/<1 of TE was added and 
melced a: 99°C for 10 mm. Five ul of this was re-amplified in a 50 u\ volume using 
CSBE25 and Ri as primers and 25 cycles of C H ; C 45 sec. 55°C 25 sec, ~2 ~'C ! min 30 
sec. .Amplified fragments were separated on a I'7 TAE agarose gel. purified on DEA£ 
paper and cloned into pT7Bluc. 

The second round of 5* RACE was performed using CSBE2S and 29 primers in the first 
and second rou: : PCR reactions respectively using a new A-tailed cDNA 'library primed 
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A third round ot 3 RACE was performed on the same C'SBEJ" primed cDNA 
Repeat .V RACK and PCR Clnnin u 

The 3' RACE library (RoRidTIT primed lea! RNA) was used as a template. The first PCR 
reaction was diluted 1:20 and 1 ul was used in a 50 «1 PCR reaction with SBE A and Ri 
primers and the products were cloned into pT7Blue. The cloned PCR products were 
screened lor the presence or absence ot the CSBE23 oligo by colony PCR. 

A lull lenuth cDNA ot cassava SBE II was isolated by PCR from leaf or nwn cDNA 
(RoRidT17 primed) using primers CSBE214 and CSBE21S from 2.5 «I ot cDNA in a 25 
itl reaction and 30 cycles of 94 J C 45 sec. 55°C 25 sec. 72 C C 2 min. 

Complementation of E. coli mutant KV832 

SBE II containine plasmids were transformed into the branching enzyme deficient mutant 
E. coli KV832 (Keil ct at.. 1987 Mol. Gen. Genet. 207. 294-301) ami cells gro-vn on 
solid PYG media (0.85 C S< KH.PO,, l.l c /< K,HPO,. 0.6 <?< yeast extract) containing 1.0 
r A ulucose. To test for complementation, a loop of cells was scraped oft and resusper.ded 
in 150 «L water to which was added 15 //L of Lugol s solution (2 g KI and i g I. per 300 
ml water). 

RNA isolation 

RNA was isolated from cassava plants by the method of Logemann H9K" 7 Anal. Biochem 
163. 21-26). Leaf RNA was isolated from 0.5 gm of in vitro grown niar.t tissue. The 
total vield was 300 «g. Three month old roots (88 gm) were used for isolation root 
RNA). 

SBE II specific oligonucleotides 

SBE A ATGGACAAGGATATGTATGA (Seq ID No. 1) 

CSBE21 GGTTTC ATG ACTTCTG AG C A ( S e q ID No 2 ) 
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CSBE22 


TGCTCAGAAGTCATGAAACC 


(Scq ID No. 


3) 


CSBE23 


TCCAGTCTCAATATACGTCG 


(Seq ID No. 


4) 


CSBE24 


AGG AGTAG ATG GTCTGTCG A 


(Scq ID No. 


5) 


CSBE25 


TC AT AC AT ATCCTTG TCC AT 


(Scq ID No. 


6) 


CSBE26 


GGGTGACTTCAATG ATGTAC 


(Seq ID No. 7) 


CSBE27 


GGTGTACATCATTGAAGTCA 


(Seq ID No. 


8) 


CSBE28 


AATTACTGGCTCCGTACTAC 


(Seq ID No. 


y) 


CSBE29 


CATTCCAACGTGCGACTCAT 


(Seq ID No. 


10) 


CSBE210 


TACCGCTAATCTAGGTGTTG 


(Seq ID No. 11) 


CSBE211 


GGACCTTGGTTTAGATCCAA 


(Seq ID No. 


12) 


CSBE212 


ATG AGTCGC ACGTTG G AATG 


(Seq ID No. 


13) 


CSBE213 


CAACACCTAGATTACCGGTA 


(Seq ID No. 


14) 




TTAGTTGCGTCAGTTCTCAC 


(Seq ID No. 


15) 


CSBE215 


AAT ATCTATCTC AG CCGG AG 


(Seq ID No. 


16) 


CSBE216 


ATCTTAGATAGTCTGCATCA 


(Seq ID No. 


17) 


CSBE217 


TGGTTGTTCCCTGGAATTAC 


(Seq ID No. 


18) 


CSBE218 


TGCAAGGACCGTGACATCAA 


(Seq ID No. 


19) 



RESULTS 

Cloning of a SBE II gene from cassava leaf 

The strategy for cloning a full length cDNA of starch branching enzyme II of cassava is 
shown in Figure 1 . A comparison of several SBE II (class A) SBE DNA sequences 
identified a 23 bp region which appears to be completely conserved among most genes 
(data not shown) and is positioned about one kilobase upstream from the 2' end ot the 
gene. An oligonucleotide primer (designated SBE A) was made to this sequence and used 
to isolate a partial cDNA clone by 3' RACE PCR from first strand leaf cDNA as 
illustrated in Figure I. An approximately 1100 bp band was amplified, cloned into 
pT7Blue vector and sequenced. This clone was designated pSJ94 and contained a 1 120 
bp insert starting with the SBE A oligo and ending with a polyA tail. There was a 
predicted open reading frame of 235 amino acids which was highly homologous {WA 
identical) to a potato SBE II also isolated by the inventors (data not shown) suggesting that 
this clone represented a class A (SBE II) gene. 
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To-obtain the sequence of a full length clone nested primers were made rwirnlementary 
to the 5* end of this sequence and used in 5' RACE PCR to isolate clones from the 5' 
region of the gene. A total of three rounds of 5* RACE was needed to determine the 
sequence of the complete gene (i.e. one that has a predicted long ORF preceded by stop 
codons). It should be noted that during this cloning process several clones (# 23, 9, 16), 
were obtained that had small deletions and in one case (clone 23) there was also a small 
(120 bp) intron present. These occurrences are not uncommon and probably arise through 
errors in the PCR process and/or reverse transcription of incompletely processed RNA 
(heterogeneous nuclear RNA). 

The overlapping cDNA fragments could be assembled into a contiguous 3 kb sequence 
(designated csbe2con.seq) which contained one long predicted ORF as shown in Figure 
2. Several clones in the last round of 5* RACE were obtained which included sequence 
of the untranslated leader (UTL). All of these clones had an ORF (42 amino acids) 46 bp 
upstream and out of frame with that of die long ORF. 

There is more than one SBE II gene in cassava 

In order to determine if the assembled sequence represented that of a single gene, attempts 
were made to recover by PCR a full length SBE II gene using primers CSBE214 and 
CSBE23 at the 5* and 3* ends of the csbe2con sequence respectively. All attempts were 
unsuccessful using either leaf or root cDNA as template. The PCR was therefore repeated 
with either the 5'- or 3'- most primer and complementary primers along the length of the 
SBE II gene to determine the size of the largest fragment that could be amplified. With 
the CSBE214 primer, fragments could be amplified using primers 210, 28. 27 and 22 in 
order of increasing distance, the latter primer pair amplifying a 2.2 kb band. With the 3 
primer CSBE23, only primer pairs with 21 and 26 gave amplification products, the latter 
being about 1200 bp. These results suggest that the original 3* RACE clone (pSJ94) is 
derived from a different SBE II gene than the rest of the 5* RACE clones even though the 
two largest PCR fragments (214 + 22 and 26 4-23) overlap by 750 bp and share several 
primer sites. It is likely that the sequence of the two genes starts to diverge around the 
CSBE22 primer site such that the 3' end of the corresponding gene does not contain the 
23 primer and is not therefore able to amplify a cDNA when used with the 214 primer. 
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To confirm this, the sequence of the longest 5' PCR fragment (214 + 22) from two clones 
(#20 designated pSJ99. &. #35) was determined and compared to the consensus sequence 
esbe2eon as shown in Figure 3. The first 2000 bases are nearly identical (the single base 
changes might well be PCR errors), however the consensus sequence is significantly 
different after this. This region corresponds to the original 3* RACE fragment pSJ94 
(SBE A, + Ri adaptor) and provided evidence that there may be more than one SBE II 
gene in cassava. 

The 3* end corresponding to pSJ99 was therefore cloned as follows: 3' RACE PCR was 
performed on leaf cDNA using the SBE A oligo as the gene specific primer so that all 
SBE II genes would be amplified. The cloned DNA fragments were then screened for the 
presence or absence of the CSBE23 primer by PCR. Two out of 15 clones were positive 
with the SBE A + Ri primer pair but negative with SBE A + CSBE23 primers. The 
sequence of these two clones (designated pSJIOl, as shown in Figure 9) demonstrated that 
they were indeed from an SBE II gene and that they were different from pSJ94. However 
the overlapping region of pSJIOl (the 3' clone) and pSJ99 (the 5' clone) was identical 
suggesting that they were derived from the same gene. 

To confirm this a primer (CSBE2I8) was made to a region in the 3' IJTR (untranslated 
region) of pSJIOl and used in combination with CSBE2I4 primer to recover by PCR a full 
length cDNA from both leaf and root cDNA. These clones were sequenced and 
designated pSJ106 & pSJ107 respectively. The sequence and predicted ORF of pSJ107 
is shown in Figure 4. The long ORF in plasmid pSJ106 was found to be interrupted by 
a stop codon (presumably introduced in the PCR process) approximately 1 kb from the 3* 
end of the gene, therefore another cDNA clone (designated pSJllft) was amplified in a 
separate reaction, cloned and sequenced. This clone had an intact ORF (data not shown). 
There were only a few differences in these two sequences (in the transit peptide aa 27- 41: 
YRRTSSCLSFNFKEA to DRRTSSCLSFIFKKAA and LS31 in pSJiOT to V in pSJU6 
respectively). 

An additional 740bp of sequence ot" the gene corresponding to the pSJ94 clone was 
isolated by 5' RACE using the primers CSBE216 and 217. and was designated pSJ!25. 
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This sequence was combined with thai of pSJ94 to form a consensus sequence "125 + 
94", as shown in Figure 10. The sequence of this second gene is about *HYZ- identical at 
the DNA and protein level to pSJl 16. as shown in Figure 5 and 6, and is clearly a second 
form of SBE II in cassava. The 3' untranslated regions of the two genes are not related 
(data not shown). 

It was also determined that the full length cassava SBE II genes (from both leaf and tuber) 
actually encode for active starch branching enzymes since the cloned genes were able to 
complement the glycogen branching enzyme deficient E. coli mutant KV832. 

Main Findings 

1) A lull length cDNA clone of a starch branching enzyme II (SBE II) gene has been 
cloned from leaves and starch storing roots of cassava. This cDNA encodes a 836 amino 
acid protein (Mr 95 Kd) and is 86 '/?- identical to pea SBE I over the centra! conserved 
domain, although the level of sequence identity over the entire coding region is lower than 
86%. 

2) There is more than one SBE II gene in cassava as a second partial SBE II cDNA was 
isolated which differs slight 1 :/ in the protein coding region from the first gene and has no 
homology in the 3* untranslated region. 

3) The isolated full length cDNA from both leaves and roots encodes an active SBE as 
it complements an E. coli mutant deficient in glycogen branching enzyme as assayed by 
iodine staining. 

We have shown that there are SBE II (Class A) gene sequences present in the cassava 
genome by isolating cDNA fragments using 3* and 5' RACE. From these cDNA 
fragments a consensus sequence of over 3 kb could be compiled which contained one long 
open reading frame (Figure 2) which is highly homologous to other SBE II (class A) genes 
(data not shown). It is likely that the consensus sequence does not represent that of a 
single gene since attempts to PCR a full length gene using primers at the 5' and 3' ends 
of this sequence were not successful. In fact screening of a number of leat derived 3 
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RACL cDNAs showed that a second SBE II gene (clone designated pSJIOl) was also 
expressed which is highly homologous within the coding region to the originally isolated 
cDNA (pSJ94) but has a ditYcreni 3* UTR. A full length SBE II gene was isolated from 
leaves and roots by PCR using a new primer to the 3* end of this sequence and the 
original sequence at the 5 % end of the consensus sequence. If the frequency of clones 
isolated by 3* RACE PCR reflects the abundance of the mRNA levels then this full length 
gene may be expressed at lower levels in the leaf than the pSJ94 clone (2 out of 15 were 
the former class, 13/15 the latter). It should be noted that each class is expressed in both 
leaves and roots as judged by PCR (data not shown). Sequence analysis of the predicted 
ORF of the leaf and root genes showed only a few differences (4 amino acid changes and 
one deletion) which could have arisen through PCR errors or, alternatively, there may be 
more than one nearly identical gene expressed in these tissues. 

A comparison of all known SBE II protein sequences shows that the cassava SBE II gene 
is most closely related to the pea gene (Figure 8). The two proteins are 86.3% identical 
over a 686 amino acid range which extends from the triple proline "elbow" (Burton et aL 9 
1995 Plant J. 7, 3-15) to the conserved WYA sequence immediately preceding the C- 
terminal extensions (data not shown). All SBE II proteins are conserved over this range 
in that they are at least 80% similar to each other. Remarkably however, the sequence 
conservation between the pea. potato and cassava SBE II proteins also extends to the N- 
terminal transit peptide, especially the first 12 amino acids of the precursor protein and 
the region surrounding the mature terminus of the pea protein (AKFSRDS). Because the 
proteins are so similar around this region it can be predicted that the mature terminus ot 
the cassava SBE II protein is likely to be GKSSHES. The precursor has a predicted 
molecular mass of 96 kD and the mature protein a predicted molecule mass of 91.3 kD. 
The cassava SBE II has a short acidic tail at the C-terminal although this is not a* long or 
as acidic as that found in the pea or potato proteins. The significance of this acidic uih 
if any, remains to be determined. One notable difference between the amino acid 
sequence of cassava SBE II and all other SBE II proteins is the presence of the sequence 
NSKH at around position 697 instead of the conserved sequence DAD/EY. Although this 
conserved region forms part of a predicted <x-helix (number 8) of the catalytic (B/«r) s barrel 
domain (Burton et al 1995 cited previously), this difference does not abolish the SBE 
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activity of the cassava protein as this gene can still complement the glycogen branching 
deletion mutant of E, coli. It may however affect the specificity of the protein. An 
interesting point is that the other cassava SBE II clone pSJ94 has the conserved sequence 
DADY. 

One other point of interest concerning the sequence of the SBE II gene is the presence of 
an upstream ATG in the 5" UTR. This ATG could initiate a small peptide of 42 amino 
acids which would terminate downstream of the predicted initiating methionine codon of 
the SBE II precursor. If this does occur then the translation of the SBE II protein from this 
mRNA is likely to be inefficient as ribosomes normally initiate at the 5' most ATG in the 
mRNA. However the first ATG is in a poorer Kozak context than the SBE II initiator and 
it may be too close to the 5' end of the message to initiate efficiently (14 nucleotides) thus 
allowing initiation to occur at the correct ATG. 

In conclusion we have shown that cassava does have SBE II gene sequences, that they arc 
expressed in both leaves and tubers and that more than one gene exists. 

Example 2 

Cloning of a second full length cassava SBE II gene 



Methods 








Oligonucleotides 






CSBE219 


CTTT ATCTATTAAAG ACTTC 


(Seq ID Ni>. 


20) 


CSBE220 


C AAAAAAGTTTGTG AC ATG G 


(Seq ID No. 


21) 


CSBE221 


TCACI 1 1 1 1 CCAATGCTAAT 


(Seq ID No. 


22) 


CSBE222 


TCTCATGCAATGGAACCGAC 


(Seq ID No. 


23) 


CSBE22^ 


C AG ATGTCCTG ACTCG G AAT 


(Seq ID No. 


24) 


CSBE224 


ATTCCG AGTCAGG ACATCTG 


(Seq ID No. 


25) 


CSBE225 


CGCATTTCTCGCTATTGCTT 


(Seq ID No. 


26) 


CSBE226 


CACAGGCCCAAGTGAAGAAT 


(Seq ID No. 


27) 



The 5' end of the gene corresponding to the 3'RACE clone pSJ94 was isolated in three 
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rounds of 5 'RACE. Prior to performing the first round of 5* RACE. 5 fig of total leaf 
RNA was reverse transcribed in a 20 fi\ reaction using conditions as decribed by the 
manufacturer (Superscript enzyme. BRL) and 10 pmol of the SBE II gene specific primer 
CSBE23. Primers were then removed and the cDNA tailed with dATP as described 
above. The first round of 5*RACE used primers CSBE216 and Ro. Tnis PCR reaction 
was diluted 1:20 and used as a template for a second round of amplification using primers 
CSBE217 and Ri. The gene specific primers were designed so that they would 
preferentially hybridise to the SBE II sequence in pSJ94. Amplified products appeared 
as a smear of approximately 600-1200 bp when subjected to electrophoresis on a 1 % TAE 
agarose gel. 

This smear was excised and DNA purified using a Qiaquick column (Qiagen) before 
ligation to the pT7Blue vector. Several clones were sequenced and clone ftl was 
designated pSJ125. New primers (CSBE219 and 220) were designed to hybridise to the 
5' end of pSJ125 and a second round of 5RACE was performed using the same CSBE23 
primed library. Two fragments of 600 and 800 bp were cloned and sequenced (clones 
13 f 17). Primers CSBE221 and 222 were designed to hybridise to the 5' sequence of the 
longest clone (#13) *p& a third round of 5* RACE was performed on a new library (5 /ig 
total leaf RNA reverse transcribed with Superscript using CSBE220 as primer and then 
dATP tailed with TdT from Boehringer Mannheim). Fragments of approximately 500 bp 
were amplified, cloned and sequenced. Clone #13. was designated pSJ 143. The process 
is illustrated schematically in Figure 12. 

To isolate a full length gene as a contiguous sequence, a new primer (CSBE225) was 
designed to hybridise to the 5 * end of clone pSJI43 and used with one of the primers 
(CSBE226 or 23) in the 3* end of clone pSJ94, in a PCR reaction using RoRidT 1 7 primed 
leaf cDNA as template. Use of primer CSBE226 resulted in production of Clone #2 
(designated pSJ144). and use of primer CSBE23 resulted in production of Clones £10 and 
13 (designated pSJ145 and pSJ146 respectively). Only pSJ146 was sequenced fully. 

Results 

Isolation of a second fu ll length cassava SBE II gene 
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A full length clone tor a second SBE II gene was isolated by extending the sequence of 
pSJV4 in three rounds ot 5' RACE as illustrated schematically in Figure 12. In each 
round of 5* RACF. primers were designed that would preferentially hybridise to the new 
sequence rather than to the gene represented by pSJ116. In the final round of 5* RACE, 
three clones were obtained that had the initiating methione codon, and none of these had 
upstream ATGs. The overlapping cDNA fragments (sequences of the 5'RACE clones 
pSJ143, 13, pSJI25 and the 3'RACE clone pSJ94) could be assembled into a consensus 
sequence of approximately 3 kb which was designated csbe2-2.seq. This sequence 
contained one long ORF with a predicted size of 848 aa (M r 97 kDa). The full length 
gene was then isolated as a contiguous sequence by PCR amplification from RoRidTIT 
primed leaf cDNA using primers at the 5' (CSBE225) and 3' (CSBE23 or CSBE226) ends 
of the RACE clones. One clone, designated pSJ146. was sequenced and the restriction 
map is shown along with the predicted amino acid sequence in Figure 13. 

Sequence homologies between SBE II genes 

The two cassava genes (pSJ116 and pSJ146) share 88.8% identity at the DNA level over 
the entire coding region (data not shown). The homology extends about 50 bases outside 
of this region but beyond this the untranslated regions show no similarity (data not 
shown). At the protein level the two genes show 86*2 identity over the entire ORF (data 
not shown). The two genes are more closely related to each other than to any other SBE 
II. Between species, the pea SBE I shows the most homology to. the cassava SBE II 
genes. 

Example 3 

Construction of plant transformation vectors and transformatio n of cassava with 
antisense starch branching enzvme genes. 

This example describes in detail how a portion of the SBE II gene isolated from cassava 
may be introduced into cassava plants to create transgenic plants with altered properties. 

An 1 100 bp Hind III - Sac I fragment of cassava SBE II ( from plasmid pSJ'U) was cloned 
into the Hind HI - Sac I sites of the plant transformation vector pSJ64 (Figure 1 i). This 
placed the SBE II gene in an antisense orientation between the 2X 35S CaMV promoter 
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and the nopalinc synthase polyadcnylation signal. pSJ64 is a derivative of the binary 
vector pGPTV-HYG (Becker et al .. 1992 Plant Molecular Biology 20: 1195-1197) 
modified by inclusion of an approximately 750 bp fragment of pJIT60 (Guerincau et al 
1992 Plant Mol. Biol. 18, 815-818) containing the duplicated cauliflower mosaic virus 
(CaMV) 35S promoter (Cabb-JI strain, equivalent to nucleotides 7040 to 7376 duplicated 
upstream of 7040 to 7433, as described by Frank et al.. 1980 Cell 21. 285-294) to replace 
the GUS coding sequence. A similar construct was made with the cassava SBE II 
sequence from plasmid pSJIOl. 

These plasmids are then introduced into Agrobacterium tumefaciens LBA4404 by a direct 
DNA uptake method (An et uL Binary vectors. In: Plant Molecular Biology Manual fed 
Galvin and Schilperoort) AD 1988 pp 1-19) and can be used to transform cassava somatic 
embryos by selecting on hygromycin as described by Li et al. (1996. Nature 
Biotechnology 14, 736-740). 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: National Starch and Chemical Investment 

Hoi dine Corporation 

(B) STREET: Suite 27. 501 Silverside Road 

(C) CITY: Wilmington 

(D) STATE: Delaware 

(E) COUNTRY: USA 

(F) POSTAL CODE (ZIP): 19809 

(ii) TITLE OF INVENTION: Improvements :n or Relating to Starch 

Content of Plants 

(iii) NUMBER OF SEQUENCES: 31 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.C. Version #1.30 (EPO) 



(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH. 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 
ATGGACAAGG ATATGTATGA 20 



(2) INFORMATION FOR SEQ ID NO: 2: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SE0 ID NO: 2: 
GGTTTCATGA CTTCTGAGCA 



(2) INFORMATION F.*: SEQ ID NO: 3: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 20 base pairs 

(B) TVPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEO ID NO: 3: 
TGCTCAGAAG TCATGAAACC 20 

(2) INFORMATION FOR S"3 ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
TCCAGTCTCA ATATACGTCG 20 

(2) INFORMATION FOR SEQ ID NO: 5: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

. (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
AGGAGTAGAT GGTCTGTCGA 20 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2G base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
TCATACATAT CCTTGTCCAT 20 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2C jase pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEO IP NO 
GGGTCACTTC AATGATGTAC 



(2) INFORMATION FOR SEO 10 NO: 8: 

(i) S-QUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEO 10 NO: 
GGTGTACATC ATTGAAGTCA 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEOUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
AATTACTGGC TCCGTACTAC 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinqle 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
CATTCCAACG TGCGACTCAT 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: s "ingle 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TACCGGTAAT CTAGGTGTTG 



l , <T7C;il97/O.WM2 



20 



8. 

20 



9: 

20 



10: 

20 



11: 

20 
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(?, INFORMATION FOR "EC 13 NO 12. 

(O SEQUENCE CHARACTERISTICS 

(A) LENGTH. 20 base oairs 

(B) TYPE: nuclei acid 

(C) STRANDEDNE3S single 

(D) TOPOLOGY.' linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12. 
GGACCTTGGT TTAGATCCAA 20 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEOUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base Dairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : sinqle 
(0) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

ATGAGTCGCA CGTTGGAATG 20 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
CAACACCTAG ATTACCGGTA - 20 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2C base pairs 

(B) TYPE: rvjcleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1.5: 
TTAGTTGCGT CAGTTCTCAC 20 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
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AA TA 7C 7 A FC 7C AGCCGGAG 



(2) 'INFORMATION FOR SEQ ID NO: 17: 

(;) SEOUEMCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS . single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 17: 
ATCTTAGATA GTCTGCATCA 



(2) INFORMATION FOR SEQ ID NO: 18: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
TGGTTGTTCC CTGGAATTAC 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
TGCAAGGACC GTGACATCAA 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEOUENCE DESCRIPTION: SEQ ID NO: 20: 
CTTTATCTAT TAAAGACTTC 20 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEOUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
CAAAAAAGTT TGTGACATGG 20 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
TCACTTTTTC CAATGCTAAT 20 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
TCTCATGCAA TGGAACCGAC 20 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
CAGATGTCCT GACTCGGAAT 20 
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(2J INFORMATION FOR SEO ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 25: 

ATTCCGAGTC AGGACATCTG 20 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CGCATTTCTC GCTATTGCTT 20 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
CACAGGCCCA AGTGAAGAAT " 20 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2588 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 21. .2531 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

CTCTCTAACT TCTCAGCGAA ATG GGA CAC TAC ACC ATA TCA GGA ATA CGT 50 

Met Gly His Tyr Thr He Ser Gly lie Arg 
1 5 10 
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TTT CCT TGT GCT CCA CTC TGC AAA TCT CAA TCT ACC GGC TTC CAT GGC 
Phe Pro Cys Ala Pro Leu Cys Lys Ser Gin Ser Thr Gly Phe His Gly 

15 20 25 



TAT CGG AGG 
Tyr Arg Arg 



TCT AGG AGG 
Ser 'Arg Ara 
45 

AAT GTA ATG 
Asn Val Met 
60 

GAA TGC TAT 
Glu Cys Tyr 
75 

TCA GAA GAA 
Ser Glu Glu 



ACC TCC TCT TGC CTT TCC TTC AAC TTC AAG GAG GCG TTT 
Thr Ser Ser Cys Leu Ser Phe Asn Phe Lys Glu Ala Phe 
30 35 40 

GTC TTC TCT GGA AAG TCA TCT CAT GAA TCT GAC TCC TCA 
Val Phe Ser Gly Lys Ser Ser His Glu Ser Asp Ser Ser 

50 55 

GTC ACT GCT TCT AAA AGA GTC CTT CCT GAT GGT CGG ATT 
Val Thr Ala Ser Lys Arg Val Leu Pro Asp Gly Arg lie 
65 70 

TCT TCT TCA ACA GAT CAA TTG GAA GCC CCT GGC ACA G7T 
Ser Ser Ser Thr Asp Gin Leu Glu Ala Pro Gly Thr Val 
80 85 90 

TCC CAG GTG CTT ACT GAT GTT GAG AGT CTC ATT ATG GAT 
Ser Gin Val Leu Thr Asp Val Glu Ser Leu He Met Asd 
95 100 105 



GAT AAG ATT 
Asp Lys He 



GAG ACA GTT 
Glu Thr Val 
125 

CCA CCC GGC 
Pro Pro Gly 
140 

GGC TTT CGT 
Gly Phe Arg 
155 

CGA GAA GAA 
Arg Glu Glu 



GGC TAT GAA 
Gly Tyr Glu 



AGA GAG TGG 
Arg Glu Trp 
205 

AAT AAC TGG 
Asn Asn Trp 
220 



GTT GAA GAT 
Val Glu Asp 
110 

AGC ATC AGA 
Ser He Arg 



AGA GGG CAA 
Arg Gly Gin 



CAA CAC CTA 
Gin His Leu 
160 

ATT GAC AAG 
He Asp Lys 
175 

AAG TTT GGT 
Lys Phe Gly 
190 

GCA CCA GGA 
Ala Pro Gly 



AAT CCT AAT 
Asn Pro Asn 



GAA GTA 
Glu Val 



AAA ATT 
Lys He 
130 

AGA ATA 
Arg He 
145 

GAT TAC 
Asp Tyr 



AAT AAA 
Asn Lys 
115 

GGA TCT 
Gly Ser 



TAT GAC 
Tyr Asp 



CGG TAT 
Arg Tyr 



TAT GAA 
Tyr Glu 



TTC TCA 
Phe Ser 



GCT ACG 
Ala Thr 
210 

GCA GAT 
Ala Asp 
225 



GGT AGT 
Gly Ser 
180 

CGC AGT 
Arg Ser 
195 

TGG GCT 
Trp Ala 



GTC ATG 
Val Met 



GAA TCT GTT 
Glu Ser Val 



AAA CCA AGG 
Lys Pro Arg 
135 

ATA GAT CCA 
He Asp Pro 
150 

TCA CAG TAC 
Ser Gin Tyr 
165 

CTG GAT GCA 
Leu Asp Ala 



GAA ACA GGA 
Glu Thr Gly 



GCA TTG ATT 
Ala Leu He 
215 

ACT CAG AAT 
Thr Gin Asn 
230 



CCA ATG CGG 
Pro Met Arg 
120 

TCC ATT CCT 
Ser He Pro 



AGC TTG ACA 
Ser Leu i hr 



AAA AGA CTC 
Lys Arg Leu 
170 

TTT TCT CGT 
Phe Ser' Arg 
185 

ATA ACT JAT 
He Thr iyr 
200 

GGA GAT TTC 
Gly Asp Phe 



GAG TGT GGT 
Glu Cys Gly 



98 



146 



194 



242 



290 



338 



386 



434 



482 



530 



578 



626 



674 



722 



SUBSTITUTE SHEET (RULE 26) 



WO 98/20145 



GTC TGG GAG ATC 
Val Trp Glu He 
235 

CCC CAT GGT TCT 
Pro His Gly Ser 



AAA GAT TCT ATT 
Lys :Asp Ser He 
270 

GAA CTC CCA TAT 
Glu Leu Pro Tyr 
285 

TAT GTG TTC AAA 
Tyr Val Phe Lys 
300 

TAT GAG TCG CAC 
Tyr Glu Ser His 
315 

TAT GCC AAC TTT 
Tyr Ala Asn Phe 



TAC AAT GCT GTT 
Tyr Asn Ala Val 
350 

AGT TTT GGG TAT 
Ser Phe Gly Tyr 
365 

GGA ACT CCT GAT 
Gly Thr Pro Asp 
380 

GGT CTT CTT GTT 
Gly Leu Leu Val 
395 

ACG TTG GAT GGG 
Thr Leu Asp Gly 



CAC TCT GGA CCA 
His Ser Gly Pro 
430 

AAC TAT GGG AGC 
Asn Tyr Gly Ser 
445 



TTT TTG CCG 
Phe Leu Pro 
240 

CGA GTA AAG 
Arg Val Lys 
255 

CCT GCT TGG 
Pro Ala Trp 



AAT GGC ATA 
Asn Gly He 



AAT CCT CAG 
Asn Pro Gin 
305 

GTT GGA ATG 
Val Gly. Met 
320 

AGA GAT GAT 
Arg Asp Asp 
335 

CAG CTC ATG 
Gin Leu Met 



CAC GTC- ACA 
His Val Thr 



GAT TTA AAG 
Asp Leu Lys 
385 

CTC ATG GAT 
Leu Met Asp 
400 

CTG AAT ATG 
Leu Asn Met 
415 

CGG GGT CAT 
Arg Gly His 



TGG GAG GTT 
Trp Glu Val 



AAT AAT GCA GAT GGT TCA CCA CCA ATT 
Asn Asn Ala Asp Gly Ser Pro Pro lie 
245 250 

ATA CGC ATG GAT ACT CCA TCT GGC AAC 
He Arg Met Asp thr Pro Ser Gly Asn 
260 265 

ATC AAG TTC TCA GTT CAA GCA CCA GGT 
He Lys Phe Ser Val Gin Ala Pro Gly 
275 280 

TAC TAT GAT CCT CCC GAG GAG GAG AAG 
Tyr Tyr Asp Pro Pro Glu Glu Glu Lys 
290 295 

CCA AAG AGA CCA AAA TCA CTT CGG ATT 
Pro Lys Arg Pro Lys Ser Leu Arg lie 

310 

AGT AGT ACG GAG CCA GTA ATT AAC ACA 
Ser Ser Thr Glu Pro Val lie Asn Thr 
325 330 

GTG CTT CCT CGC ATC AAA AAG CTT GGC 
Val Leu Pro Arg He Lys Lys Leu Gly 
340 345 

GCT ATT CAA GAG CAT TCA TAT TAT GCT 
Ala lie Gin Glu His Ser Tyr Tyr Ala 
355 360 

AAC TTT TAT GCA GCT AGC AGC CGA TTT 
Asn Phe Tyr Ala Ala Ser Ser Arg Phe 
370 375 

TCT CTA ATA GAT AAA GCT CAC" GAG TTA 
Ser Leu He Asp Lys Ala His Glu Leu 
390 

ATT GTT CAT AGC CAT GCA TCA ACT AAT 
lie Val His Ser His Ala Ser Thr Asn 
405 410 

TTT GAT GGT ACG GAT GGT CAC TAC TTT 
Phe Asp Glv Thr Asp Gly His Tyr Phe 
420 425 

CAT TGG ATG TGG GAC TCT CGC CTT TTC 
His Trp Met Trp Aso Ser Arg Leu Phe 
435 440 

CTA AGG TTT CTT CTT TCA AAT GCA AGG 
Leu Arg Phe Leu Leu Ser Asn Ala Arg 
450 455 
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TGG TGG TTG GAT GAG TAC AAG TTT GAT GGG TTC AGA TTT GAT GGG GTG 1442 
Trp Trp Leu Asp Glu Tyr tys Phe Asp Gly Phe Arg Phe Asp Gly Val 
460 465 470 

ACT TCA ATG ATG TAC ACC CAT CAT GGA TTG CAG GTA GAT TTT. ACC GGC 1490 
Thr Ser Met Met Tyr Thr His His Gly Leu Gin val Asp Phe Thr Gly 
475 480 485 490 

AAC TAC ,VT GAA TAC TTT GGA TAT GCA ACT GAT GTA GAT GCT GTG GTT 1538.. 
Asn ;Tyr Asn Glu Tyr Phe Gly Tyr Ala Thr Asp Val Asp Ala Val Val 
495 500 50s 

TAT TTG ATG CTG TTG AAT GAT ATG ATT CAT GGT CTC TTC CCA GAG GCT 1586 
Tyr Leu Met Leu Leu Asn Asp Met He His Gly Leu Phe Pro Glu Aia 
510 515 520 

GTC ACC ATT GGT GAA GAT GTT AGT GGA ATG CCA ACA GTT TGC ATT CCG 1634 
Val Thr He Gly Glu Asp Val Ser Gly Met Pro Thr Val Cys He Pre 
525 530 535 

GTT GAA GAT GGT GGT GTT GGC TTT GAT TAT CGT CTC CAC ATG GCT GTT 1682 
Val Glu Asp Gly Gly Val Gly Phe Asp Tyr Arg Leu His Met Ala Val 
540 545 550 

GCT GAT AAA TGG GTT GAG ATT ATT CAG AAG AGA GAT GAA GAT TGG AAA 1730 
Ala Asp Lys Trp Val Glu He He Gin Lys Arg Asp Glu Asp Trp Lys 
555 560 565 s /u 

ATG GGT GAC ATT GTA CAT ATG CTG ACC AAC AGG CGG TGG TTG GAA AAG 1778 
Met Gly Asp lie Val His Met Leu Thr Asn Arg Arg Trp Leu Giu Lys 
575 580 1 585 

TGT GTT TCT TAT GCT GAA AGT CAT GAC CAG GCC CTT GTT GGT GAC AAA 1826 
Cvs Val Ser Tyr Ala Glu Ser His Asp Gin Ala Leu Val Gly Asp Lys 
590 595 600 

ACT ATT GCA TTT TGG CTG ATG GAC AAG GAT ATG TAT GAC TTC ATG GCT 1874 
Thr He Ala Phe Trp Leu Met Asp Lys Asp Met Tyr Asp Phe Met Ala 
605 610 615 

CTT GAC AGA CCA TCT ACT CCT CTC ATA GAT CGT GGA GTA GCA TTG CAC 1922 
Leu A^p Arg Pro Ser Thr Pro Leu He Asp Arg Gly Val Ala Leu his 
620 625 630 

AAA ATG ATC AGG CTT ATT ACC ATG GGA TTA GGC GGA GAA GGA TAT TTG 1970 
Lys Met lie Ara Leu He Thr Met Gly Leu Gly Gly Glu Gly iyr Leu 
635 " 640 645 6d0 

AAT TTT ATG GGA AAT GAA TTT GGA CAC CCC GAG TGG ATT GAT TTT CCA 2018 
Asn Phe Met Gly Asn Glu Phe Gly His Pro Glu Trp lie Asp Phe Pre 
655 660 

AGA GGT GAT CTA CAT CTT CCC AGT GGT AAA TTT GTT CCT GGG AAC AAT 2066 
Arg Gly Asp Leu His Leu Pro Ser Gly Lys Phe Val Pro Gly Asn Asn 
670 675 680 
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TAC AGT TAT GAT AAA TGC CGG CGT AGG TTT GAT CTA GGC AAT TCA AAG 2114 
Tyr Ser Tyr Asd Lys Cys Arg Arg Arg Phe Asp Leu Gly Asn Ser Lys 
685 690 695 

CAT CTG AGA TAT CAT GGA ATG CAA GAG TTT GAT CAA GCA ATT CAG CAT 2162 
His Leu Arg Tyr His Gly Met Gin Glu Phe Asp Gin Ala He Gin his 
700 705 710 

CTT GAA GAA GCC TAT GGT TTC ATG ACT TCT GAG CAC CAA TAC ATA TCA 2210 
Leu iGlu Glu Ala Tyr Gly Phe Met Thr Ser Glu His Glr, Tyr De Ser 
715 720 725 730 

CGG AAG GAT GAA AGG GAT CGG ATC ATT GTC TTC GAG AGG GGA AAC CTC 2258 
Arg Lys Asp Glu Arg Asp Arg He He Val Phe Glu Arg Gly Asn Leu 
735 740 745 

GTT TTT GTA TTC AAT TTT CAT TGG ACT AGC AGC TAT TCG GAT TAC CGA 2306 
Val Phe Val Phe Asn Phe His Trp Thr Ser Ser Tyr Ser Asp Tyr Arg 
750 755 760 

GTT GGC TGC TTA AAG CCA GGA AAG TAC AAG ATA GTC TTG GAT TCA GAT 2354 
Val Gly Cys Leu Lys Pro Gly Lys Tyr Lys He Val Leu Asp Ser Asp 
765 770 775 

GAT CCT TTG TTT GGA GGC TTT GGC AGG CTT AGT CAT GAT GCA GAG CAC 2402 
Asd Pro Leu Phe Gly Gly Phe Gly Arg Leu Ser His Asp Ala Glu His 
780 \. 785 790 

TTC AGC TTT GAA GGG TGG TAC GAT AAC CGG CCT CGA TCC TTC ATG GTG 2450 
Phe Ser Phe Glu Gly Trp Tyr Asp Asn Arg Pro Arg Ser Phe Met Val 
795 800 805 810 

TAC ACA CCA TGT AGA ACA GCA. GTG GTC TAT GCT TTA GTG GAG GAT GAA 2498 
Tvr Thr Pro Cys Arg Thr Ala Val Val Tyr Ala Leu Val Giu Asp Glu 
815 820 825 

GTG GAG AAT GAA TTG GAA CCT GTC GCC GGT TAA GATATATCTT AA.CAACAGGT 2551 
Val Glu Asn Glu Leu Glu Pro Val Ala Gly * 
830 835 

TCTGAAGCAG GAATGCCATT ATTGATCTTC CTATGTT 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 837 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SE0 ID NO: 29: 

Met Gly His Tyr Thr He Ser Gly He Arg Phe Pro Cys Ala Pro Leu 
15 10 15 



88 
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Cys Lys Ser Gin Ser Thr Gly Phe His Gly Ty Arg Arg Thr Ser Ser 

20 25 30 

Cys Leu Ser Phe Asn Phe Lys Glu A* a Phe Ser Arg Arg Val Phe Ser 
35 40 45 

Gly Lys Ser Ser His Glu Ser Asp Ser Ser Asn Val Met Val Thr A; a 
50 55 60 

Ser*Lv . Arg Val Leu Pro Asd Gly Arg He Glu Cys Tyr Ser Ser Ser 
65 ' 70 75 80 

Thr Asp Gin Leu Glu Ala Pro Gly Thr Val Ser Glu Glu Ser Gin Val 

85 90 95 

Leu Thr Asp Val Glu Ser Leu lie Met Asp Asp Lys He Val Glu Asp 
100 105 110 

Glu Val A<;n Lys Glu Ser Val Pro Met Arg Glu Thr Val Ser He Arg 
115 120 125 

Lys lie Gly Ser Lys Pro Arg Ser He Pro Pro Pro Gly Arg Gly Gin 
130 135 140 

Arg He Tyr Asp He Asp Pro Ser Leu Thr Gly Phe Arg Gin riis Leu 
145 150 155 160 

Asp Tyr Arg Tyr Ser Gin Tyr Lys Arg Leu Arg Glu Glu lie Asp Lys 
165 170 175 

Tyr Glu Gly Ser Leu Asp Ala Phe Ser Arg Gly Tyr Glu Lys Phe Gly 
180 " 185 190 

Phe Ser Arg Ser Glu Thr Gly lie Thr Tyr Arg Glu Trp Ala Pro Gly 
195 200 205 

Ala Thr Trp Ala Ala Leu He Gly Asd Phe Asn Asn Trp Asr-Pro Asn 
210 215 220 

Ala Asp Val Met Thr Gin Asn Glu Cys Gly V.il Trp Glu l!3 Phe Leu 
225 230 Z35 240 

Pro Asn Asn Ala Asp Gly Ser Pre Pro lie Pro His Gly Ser Ara Val 
245 250 255 

Lys He Arg Met Asp Thr Pro Ser Gly Asn Lys Asp Ser He Pro Ala 
260 265 270 

Trp He Lys Phe Ser Va 1 Gin Ala Pro Gly Glu Leu Pro Tyr Asn Gly 
275 280 285 

He Tyr Tyr Asp Pro Pro Glu Glu Glu Lys Tyr Val Phe Lys Asn Pro 
290 295 300 

Gin Pro Lys Arg Pro Lys Ser Leu Arg He Tyr Glu Ser His Val Gly 
305 310 315 320 
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Met Ser Ser Thr Glu Pro Val He Asn Thr Tyr Ala Asr. Phe Arg Asd 
325 330 335 

Asp Val Leu Pro Arg He Lys Lys Leu Gly Tyr Asn Ala Va: G<n Leu 
340 345 35C 

Met Ala He Gin Glu His Ser Tyr Tyr Ale Ser Pre Gly Tyr His Val 
355 360 365 

ThriAsn Phe Tyr Ala Ala Ser Ser Arg Phe Glv Thr Pro Asd Acp Leu 
370 375 380 

Lys Ser Leu He Asp Lys Ala His .Liu Leu Gly Leu Leu Val Leu Met 
385 390 395 400 

Asp He Val His Ser His Ala Ser Thr Asn Thr Leu Asd Gly Leu Asr. 

405 410 415 

Met Phe Asp Gly Thr Asp Gly H^s Tyr Phe His Ser Glv Pro Arg Gly 
420 425 430 

His His Trp Met Trp Asp Ser Arg Leu Phe Asn Tyr Gly Ser Trp Glu 
435 440 445 

Val Leu Arg Phe Leu Leu Ser Asn Ala Arg Trp Trp Leu Asp Glu Tyr 
.450 455 460 

Lys Phe Asp Gly Phe Arg Phe Asp Gly Val Thr Ser Met Met Tyr Thr 
465 470 475 480 

His His Gly Leu Gin Val Asp Phe Thr Gly Asn Tyr Asn Glu Tyr Phe 
485 490 495 

Gly Tyr Ala Thr Asd Val Asp Ala Val Val Tyr Leu Met Leu Leu Asn 
500 505 510 

Asp Met He His Gly Leu Phe Pro Glu Ala Val Thr He Gly Glu Asp 
515 520 525 

Val Ser Gly Met Pro Thr Val Cys lie Pro Val Glu Asp Gly Gly Val 
530 535 540 

Gly Phe Asp Tyr Arg Leu His Met Ala Val Ala Asp Lys Tro Val Glu 
545 550 555 560 

lie He Gin Lys Arg Asp Glu Asp Trp Lys Met Gly Asd He Val His 
565 570 575 

Met Leu Thr Asn Arg Arq Trp Leu Glu Lys Cys Val Ser Tyr Ala Glu 
590 ' 585 59C 

Ser His Asp Gin Ala Leu Val Gly Asp Lys Thr He Ala Phe Trp Leu 
595 600 605 

Met Asp Lys Aso Met Tyr Asp Phe Met Ala Leu Asd Arg Pro Ser Thr 
610 615 620 
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625 630 635 640 

Thr Met Gly Leu Gly Gly Glu Gly Tyr Leu Asn Phe Met Gly Asn G;u 
645 650 655 

Phe Gly His Pro Glu Trp He Asp Phe ' o Arg Glv Asp Leu His Leu 
660 665 ' 670 

Pro ( Ser Gly Lys Phe Val Pro Gly Asn Asn Tyr Ser Tyr Asp Lys Cys 
675 680 685 

Arg Arg Arg Phe Asp Leu Gly Asn Ser Lys His Leu Arg Tyr His Gly 
690 695 700 

Met Gin Glu Phe Asp. Gin Ala He Gin His Leu Glu Glu Ala Tvr Gl . 
705 710 715 ' 720 

Phe Met Thr Ser Glu His Gin Tyr He Ser Arg Lys Asp Glu Arg Asp 
725 730 735 

Arg He He Val Phe Glu Arg Gly Asn Leu Val Phe Val Phe Asn Phe 
740 745 750 

His Trp Thr Ser Ser Tyr Ser Asp Tyr Arg Val Gly Cys Leu Lys Pro 
755 760 765 

Gly Lys Tyr Lys lie Val Leu Asp Ser Asp Asp Pro Leu Phe Gly Gly 
770 775 - 780 

i" 

Phe Gly Arg Leu Ser His Asp Ala Glu His Phe Ser Phe Glu Gly Trp 
785 790 795 80C 

Tyr Asp Asn Arg Pro Arg Ser Phe Met Val Tyr Thr Pro Cvs Arg Thr 
805 810 ' 815 

Ala Val Val Tyr Ala Leu Val Glu Asp Glu Val Glu Asn Glu "Leu Glu 
820 825 830 

Pro Val Ala Gly * 
835 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2805 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEDNESS : single 

(D) TOPOLOGY : linear 

(ix) FEATURE: 

(A) NAME/KEY: COS 

(B) LOCATION: 131. .2677 

(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 30: 
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AGTGAATTCG AGCTCGGTAC CCGGGGATCC GATTCGCATT TCTCGCTATT GCT7TCCGTT 60 

TATTTCCATA TATAAAATAT CAAATCTAAT CACTTGCGCC ATTTCTATCT CTCTCCAAAC 120 

TCTCACCGAA ATG GTA TAC TAG ACT GTA TCA GGC ATA CGT TTT OCT TGI 169 
Met Val Tyr Tyr Thr Val Ser Gly He Arg Phe Pro Cys 
840 845 850 

GCA CCT TCA CTC TAC AAA TCT CAG CTC ACC AGC TTC CAT GGC GGT CGA - 217 
Ala Pro Ser Leu Tyr Lys Ser Gin Leu Thr Ser Phe His G"!y Gly Arc 
855 860 865 

AGG ACC TCT TCT GGC CT TCC TTC CTC TTG AAG AAG GAG CTG TTT CCT 265 
Arg Thr Ser Ser Gly Lru Ser Phe Leu Leu Lys Lys Glu Leu Pne Pro 
870 875 880 

CGG AAG ATC TTT GCT GGA AAG TCC TCT TAT GAA TCT GAC TCC TCA AAT 313 
Arg Lys He Phe Ala Gly Lys Ser Ser Tyr Glu Ser Asd Ser Ser Asn 
885 890 895 

TTA ACT GTC TCT GCA TCT GAG AAG GTC CTT GTT CCT GAT GAT CAG ATT 361 
Leu Thr Val Ser Ala Ser Glu Lys Val Leu Val Pro Asd Asp Glr. He 
900 905 910 

GAT GGC TCT TCT TCT TCA ACA TAT CAA TTA GAA ACC ACT GGC ACA GTT 409 
Asp Gly Ser Ser Ser Ser Thr Tyr Gin Leu Glu Thr Thr Gly Thr Val 
915 920 925 930 

TTG GAG GAA TCC CAG GTT CTT GGT GAT GCA GAG AGT CTT GTG ATG GAA 457 
Leu Glu Glu Ser Gin Val Leu Gly Asp Ala Glu Ser Leu Val Met Glu 
935 940 945 

GAT GAT AAG AAT GTT GAG GAG GAT GAA GTA AAA AAA GAG TCG GTT CCA 505 
Asp Asp Lys Asn Val Glu Glu Asp Glu Val Lys Lys Glu Ser Val Pro 
950 955 960 

TTG CAT GAG ACA ATT AGC ATT GGA AAA AGT GAA TCT AAA. CCA AGG TCC 553 
Leu His Glu Thr lie Ser He Gly Lys Ser Glu Ser Lys Pro Arg Ser 
965 970 975 

ATT CCT CCA CCT GGC AGT GGG CAG AGA ATA TAT GAC ATA GAT CCA AGC 601 
lie Pro Pro Pro Gly Ser Gly Gin Arg He Tyr Asd lie Asd p ^o Ser 
980 985 990 

TTG GCA GGT TTC CGT CAG CAT CTT GAC TAC CGA TAT TCA CAG TAC AAA 649 
Leu Ala Gly Phe Arg Gin His Leu Asp Tyr Arg Tyr '3er Glr. Tyr Lys 
995 1000 1005 1010 

AGG CTG CGT GAG GAA ATT GAC AAG TAT GAA GGT GGT TTG GAT GCA TTC 697 
Arg Leu Arg Glu Glu He Asp Lvs Tyr Glu Gly Gly Leu Asp Ala Phe 
1015 1020 1025 

TCT CGT GGA TTT GAA AAG TTT GGT TTC TTA CGC AGT GAA ACA GGA ATA 745 
Ser Arq Gly Phe Glu Lys Phe Gly Phe Leu Arg Ser Glu Thr Gly He 
1030 1035 1040 
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ACT TAT AGG GAA TGG GCA CCT GGA GC7 ACG TGG GCT GCA CTT Am GGA 793 

Thr Tyr Arg Glu Trp Ala Pro Gly Ala Thr Trp Ala Ala Leu lie Gly 
1045 1050 1055 

GAT TTC AAC AAT TGG AAT CCT AAT GCA GAT GTC ATG ACT CGG AAT GAG 841 
Asp Phe Asn Asn Trp Asn Pro Asn Ala Asp Val Met Thr Arg Asn Glu 
1060 1065 1070 

TTT GGT GTC TGG GAG ATT TTT TTG CCA AAT AAC GCA GAT GGT TCA CCA 889 
Phe .Gly Val Trp Glu He Phe Leu Pro Asn Asn Ala Asp Gly Ser Pro 
1075 1080 1085 109C 

CCA ATT CCT CAT GGT TCT CGA GTA AAG ATA CGC ATG GAT ACT CCA TCT 937 
Pro lie Pro His Gly Ser Arg Val Lys He Arg Met Asp Thr Pr 0 Ser 
1095 1100 1105 

GGC ATC AAA GAT TCA ATT CCT GCT TGG ATC AAG TTC TCA GTT CAG GCA 985 
Gly lie Lys Asp Ser He Pro Ala Trp He Lys Phe Ser Val Gin Ala 
1110 1115 1120 

CCT GGT GAA ATC CCA TAC AAT GCC ATA TAC TAT GAT CCA CCA AAG GAG 1033 
Pro Gly Glu He Pro Tyr Asn Ala lie Tyr Tyr Asp Pro Pro Lys Glu 
1125 1130 1135 

GAG AAG TAT GTG UC AAA CAT CCT CAG CCA AAG AGA CCA AAA TCA CTT 1081 
Glu Lys Tyr Val Phe Lys His Pro Gin Pro Lys Arg Pro Lys Ser Leu 
1140 1145 1150 

AGG ATT TAT GAA TCT CAT GTT GGG ATG AGT AGT ATG GAG CCA ATA ATT 1129 
Arg He Tyr Glu Ser His Val Gly Met Ser Ser Met Glu Pro He He 
1155 1160 1165 1170 

AAC ACA TAT GCC AAC TTT AGA GAT GAT ATG CTT CCT CGC ATC AAA AAG 1177 
Asn Thr Tyr Ala Asn Phe Arg Asp Asp Met Leu Pro Arg lie Lys Lys 
1175 1180 1185 

CTT GGC TAC AAT GCT GTT CAG ATC ATG GCT ATT CAA GAG CAT TCC TAT 1225 
Leu Gly Tyr Asn Ala Val Gin He Met Ala He Gin Glu His Ser Tvr 
1190 1195 1200 

TAT GCT AGT TTT GGG TAC CAT GTC ACA AAC TTT TTT GCA CCT AGC AGC 1273 
Tyr Ala Ser Phe Gly Tyr His Val Thr Asn Phe Phe Ala Pro Ser Ser 
1205 1210 1215 

CGA TTT GGA ACT CCT GAT GAT TTG AAG TCT TTA ATA GAT AAA GCT CAT 1321 
Arg Phe Gly Thr Pro Asp Asp Leu Lys Ser Leu He Aso Lys Ala His 
1220 1225 1230 

GAG TTA GGG CTG CTT GTT CTC ATG GAT ATT GTT CAT AGC CAT GCG TCA 1369 
Glu Leu Gly Leu Leu Val Leu Met Asp He Val His Ser His Ala Ser 
1235 1240 1245 1250 

AAT AAT ACG TTG GAT GGG CTG AAC ATG TTT GAT GGT ACG GAT AGT CAC 1417 
Asn Asn Thr Leu Asp Gly Leu Asn Met Phe Asp Gly Thr Asp Ser His 
1255 1260 1265 
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-TAC TTC CAC TCC GGA TCA CGG GGT CAT CAT TGG 7TG TGG GAC TCT CGC 1465 
Tyr Phe His Ser Gly Ser Arg Gly H;s His Trp Leu Trp Asp Ser Arg 
1270 1275 1280 

CTT TTC AAC TAT GGA AGC TGG GAG GTG CTA AGA TTT CTT CTT TCA AAT 1513 
Leu Ph* Asn Tyr Gly Ser Trp Glu Val Leu Ar -he Leu Leu Ser Asn 
1285 1290 " 1295 

GCA AGA TGG 7 TTG GAA GAG TAC AGG TTT GAT GGT TTT AGA TTT. GAT 1561 
Ala.Arg Trp Trp Leu Glu Glu Tyr Ara Phe Asp Glv Phe Arg Phe Asp 
1300 1305 1310 

GGG GTG ACT TCC ATG ATG TAC ACT CCC CAT GGG TTG CAG GTA GCT TTT 1609 
Gly Val Thr Ser Met Met Tvr Thr Pro His Glv Leu Gin Va) Ala Phe 
1315 1320 * 1325 1330 

ACT GGC AAC TAC AAT GAG TAC TTT GGA TAT GCA ACT GAT GTA GAT GCT 1657 
Thr Gly Asn Tyr Asn Glu Tyr Phe Gly Tyr Ala Thr Asp Val Asp Ala 
13" 1340 1345 

GTG ATT TAT TTG ATG CTT GTG AAT GAT ATG ATT CAC GGT CTT TTC CCT 17Q5 
Val lie Tyr Leu Met Leu Val Asn Asp Met lie His Gly Leu Phe Pro 
1350 1355 1360 

GAG GCT GTT ACC ATT GGT GAA GAT GTT AGC GGA AAG CCA ACA TTT TGC 1753 
Glu Ala Val Thr He Gly Glu Asp Val Ser Gly Lys Pro Thr Phe Cys 
1365 1370 1375 

ATT CCA GTG GAA GAT GGT GGT GTT GGA TTT GAT TAC CGT CTC CAC ATG 1801 
He Pro Val Glu Asp Gly Gly Val Gly Phe Asp Tyr Arg Leu His Met 
1380 1385 1390 

GCC ATT GCC GAT AAA TGG ATT GAG ATT CTT AAG AAG AGA GAT GAG GAC 1849 
Ala He Ala Asp Lys Trp He Glu He Leu Lys Lvs Arg Asp Glu Asp 
1395 1400 1405 1410 

TGG AAA ATG GGT GAC ATT GTG CAT ACA CTC ACC AAC AGA AGG TGG TTG 1897 
Trp Lys Met Gly Asp He Val His Thr Leu Thr Asn Arg Arg Trp Leu 
1415 1420 142E" 



GAA AAA TGT GTT GCT TAT GCT GAA AGT CAT GAC CAA GCT CTT GTT GG T 1945 
Glu Lys Cvs Val Ala Tyr Ala Glu Ser His Asp Gin Ala Leu Val Gly 
' 1430 1435 1440 

GAC AAA ACT ATT GCA TTT TGG CTG ATG GAC AAG GAC ATG TAC GAC TTC 1993 
Asp Lys Thr He Ala Phe Trp Leu Met Asp Lys Asp Met Tyr Asp Phe 
1445 1450 1455 

ATG GCT CGT GAC AGA CCA TCT ACT CCT CTT ATA GAT CGT GGA ATA GCA 2041 
Met Ala Arg Asp Arg Pro Ser Thr Pro Leu He Asp Arg Gly He Ala 
1460 1465 147C 

TTG C J AAA ATG ATC AGG CTT ATT ACC ATG GGC TTA GGC GGA GAA GGA 2089 
Leu His Lys Met He Arg Leu He Thr Met Gly Leu Gly Gly Glu Gly 
1475 1480 1485 1490 
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TAT TTG AAT TTT ATG GGA AAT GAA TIT GGA CAT CCT GAG TGG ATT GAT 2137 

Tyr Leu Asn Phe Met Gly Asn Glu Phe Gly His Pro Giu Trp lie Asp 
1495 1500 1505 

TTT CCA AGA GGG GAT CGA CAT CTG CCC AAT GGT AAA GTA ATT CCA GGG 2185 
Phe Pro Arg Gly Asp Arg His Leu Pro Asn Gly Lys Val Ile Pro Glv 
1510 1515 1520 

AAC AAC CAC AGT TAT GAT AAA TGC CGT CGI AGA TTT GAT CTA GGT GAT 2233 
Asn t Asn His Ser Tyr Asp Lys Cys Arg Arg Arg Phe Asp Leu Gly Asd 
1525 1530 1535 

GCA GAC TAT CTA AGA TAT CAT GGA ATG CAA GAG TTT GAT CAG GCA ATG 2281 
Ala Asp Tyr Leu Arg Tyr His Gly Met Gin Glu Phe Asp Gin A^a Met 
1540 1545 1550 

CAA CAT CTT GAA GAA GCC TAT GGT TTC ATG ACT TCT GAG CAC CAG TAT 2329 
Gin His Leu Glu Glu Ala Tyr Gly Phe Met Thr Ser Glu His Gin Tyr 
1555 1560 1565 1570 

ATA TCA CGG AAG GAT GAA GGA GAT CGG ATC ATT GTC TTT GAG AGG GGA 2377 
Ile Ser Arg Lys Asp Glu Gly Asp Arg Ile Ile Val Phe Glu Arg Gly 
1575 1580 1585 

AAC CTT GTT TTT GTA TTC AAC TTT CAT TGG ACT AAC AGC TAT TCA GAT 2425 
Asn Leu Val Phe Val Phe Asn Phe His Trp Thr Asn Ser Tyr Ser Asp 
1590 1595 1600 

TAC CGA GTT GGC TGC TTC AAG TCA GGA AAG TAC AAG ATT GTT TTG GAC 2473 
Tyr Arg Val Gly Cys Phe Lys Ser Gly Lys Tyr Lys lie Val Leu Asp 
1605 1610 1615 

TCG GAT GAT GGC TTG TTT GGA GGC TTC AAC AGG CTT AGT CAT GAT GCC 2521 
Ser Asp Asp Gly Leu Phe Gly Gly Phe Asn Arg Leu Ser His Asp Ala 
1620 1625 1630 

GAG CAC TTC ACC TTT GAC GGG TGG TAT GAT AAC CGG CCT CGG TCC TTC 2569 
Glu His Phe Thr Phe Asd Gly Trp Tyr Asp Asn Arg Pro Arg Ser Phe 
1635 1640 1645 1650 

ATG GTA TAT GCA CCA TCT AGG ACA GCA GTG GTC TAT GCT TTA GTA GAA 2617 
Met Val Tyr Ala Pro Ser Arg Thr Ala Val Val Tyr Ala Leu Val Glj 
1655 1660 1665 

GAT GAA GAG AAT GAA GCA GAG AAT GAA GTA GAA AGT GAA GTG AAA CCA 2665 
Asp Glu Glu Asn Glu Ala Glu Asn Glu Val Glu Ser Glu val Lys Pre 
1670 1675 168C 

GCC TCC GGC TGA GATAGATATT TAGTAAGAGG ATCCCCTAAA GCAGGAATGG 2717 
Ala Ser Gly * 
1685 

TTAACCTGTG CATCTGCATT GAACGACGTA TATTGAGACT GGAAATCCAT ATGACTAGTA 2777 
GATCCTCTAG AGTCGACCTG CAGGCATG 2805 
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(2) INFORMATION FOR SEQ 10 NO. 31. 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 849 amino acids 
(3) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

Met Val Tyr Tyr Thr Val Ser Gly lie Arg Phe Pro Cys Ala Pro Ser 
1 5 10 15 

Leu Tyr Lys Ser Gin Leu Thr Ser Phe His Gly Gly Arg Arg Thr Ser 

20 25 :o 

Ser Gly Leu Ser Phe Leu Leu Lys Lys Glu Leu Phe Pro Arg Lys lie 
35 40 45 

Phe Ala Gly Lys Ser Ser Tyr Glu Ser Asp Ser Ser Asn Leu Thr Val 
50 55 60 

Ser Ala Ser Glu Lys Val Leu Val Pro Asp Asp Gin He Asd Gly Ser 
65 70 75 80 

Ser Ser Ser Thr Tyr Gin Leu Glu Thr Thr Gly Thr Val Leu Glu Glu 

85 90 95 

Ser Gin Val Leu Gly Asp Ala Glu Ser Leu Val Met Glu Asp Asp Lys 
100 105 110 

Asn Val Glu Glu Asp Glu Val Lys Lys Glu Ser Val Pro Leu His Glu 
115 120 125 

Thr He Ser He Gly Lys Ser Glu Ser Lys Pro Arg Ser He Pro Pro 
130 135 140 

Pro Gly Ser Gly Gin Arg lie Tyr Asp lie Asp Pro Ser Leu Ala Gly 
145 150 155 160 

Phe Arg Gin His Leu Asp Tyr Arg Tyr Ser Gin Tyr Lys Arg Leu Arg 
165 170 175 

Glu Glu He Asp Lys Tyr Glu Gly Gly Leu Asp Ala Phe Ser Arg Gly 
180 185 190 

Phe Glu Lys Phe Gly Phe Leu Arg Ser Glu Thr Gly lie Thr Tyr Arg 
195 200 205 

Glu Trp Ala Pro Gly Ala Thr Trp Ala Ala Leu He Gly Asp Phe Asn 
210 215 220 

Asn Trp Asn Pro Asn Ala Asp Val Met Thr Arg Asn Glu Phe Gly Val 
225 230 235 240 
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Tro Glu lie Phe Leu Pro Asn Asn Ala Asp Gly Ser Pro Pro lie Pro 
. 245 250 255 

His Gly Ser Arg Val Lys lie Arg Met Asp Thr Pro Ser Gly He Lys 
260 265 270 

Asd Ser He Pro Ala Trp He Lys Pne Ser Val Gin Ala Pro Gly Glu 
275 280 285 

lie Pro Tyr Asn Ala He Tyr Tyr Asp Pro Pro Lys Glu Glu Lys Tyr 
'290 295 300 

Val Phe Lys His Pro Gin Pro Lys Arg Pro Lys Ser Leu Arg He Tyr 
305 310 315 320 

Glu Ser His Val Gly Met Ser Ser Met Glu Pro He He Asn Thr Tyr 
325 330 335 

Ala Asn Phe Arg Asp Asp Met Leu Pro Arg He Lys Lys Leu Gly Tyr 
340 345 350 

Asn Ala Val Gin He Met Ala He Gin Glu His Ser Tyr Tyr Ala Ser 
355 360 365 

Phe Gly Tyr His Val Thr Asn Phe Phe Ala Pro Ser Ser Arg Phe Gly 
370 • 375 380 

Thr Pro Asp Asp Leu Lys Ser Leu He Asp Lys Ala His Glu Leu Gly 
385 390 395 400 

Leu Leu Val Leu Met Asp He Val His Ser Hi's Ala Ser Asn Asn Thr 
405 410 415 

Leu Asp Gly Leu Asn Met Phe Asp Gly Thr Asp Ser His Tyr Phe His 
420 425 430 

Ser Gly Ser Arg Gly His His Trp Leu Trp Asp Ser Arg Leu. Phe Asn 
435 440 445 

Tvr Gly Ser Trp Glu Val Leu Arg Phe Leu Leu Ser Asn Ala Arg Trp 
450 455 460 

Tro Leu Glu Glu Tyr Arg Phe Asp Gly Phe Arg Phe Asp Gly Val Thr 
465 470 475 480 

Ser Met Met Tyr Thr Pro His Gly Leu Gin Val Ala Phe Thr Gly Asn 
485 490 495 

Tyr Asn Glu Tyr Phe Gly Tyr Ala Thr Asp Val Asp Ala Val He Tyr 
500 505 510 

Leu Met Leu Val Asn Asp Met He His Gly Leu Phe Pro Glu Ala Val 
515 520 525 

Thr lie Gly Glu Asp Val Ser Gly Lys Pro Thr Phe Cys He Pro Val 
530 535 540 
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Glu Asp Gly Gly Val Gly Phe Asp Tyr Arg Leu His Met Ala He Ala 

545 550 555 560 

Asp Lys Trp lie Glu He Leu Lys Lys Arg Asp Glu Asp Trp Lys Met 
565 570 575 

Gly Asp lie Val His Thr. Leu Thr Asn Arg Arg Trp Leu Glu Lys Cys 
580 585 590 

Val Ala Tyr Ala Glu Ser His Asp Gin Ala Leu Val Glv Asp Lys Thr 
' 595 600 605 

He Ala Phe Trp Leu Met Asp Lys Asp Met Tyr Asp Phe Met Ala Arg 
610 615 620 

Asp Arg Pro Ser Thr Pro Leu He Asp Arg Gly He Ala Leu His Lys 
625 630 635 640 

Met He Arg Leu He Thr Met Gly Leu Gly Gly Glu Gly Tyr Leu Asn 
645 650 655 

Phe Met Gly Asn Glu Phe Gly His Pro Glu Trp lie Asp Phe Pro Arg 
660 665 670 

Gly Asp Arg His Leu Pro Asn Gly Lys Val He Pro Gly Asn Asn His 
675 680 685 

Ser Tyr Asp Lys Cys Arg Arg Arg Phe Asp Leu Giy Asp Ala Asp Tyr 
690 695 . 700 

Leu Arg Tyr His Gly Met Gin Glu Phe Asp Glh Ala Met Gin His Leu 
705 710 715 720 

Glu Glu Ala Tyr Gly Phe Met Thr Ser Glu His Gin Tyr He Ser. Arg 
725 730 735 

Lys Asp Glu Gly Asp Arg He He Val Phe Glu Arg Gly Asn leu Val 
740 745 750 

Phe Val Phe Asn Phe His Trp Thr Asn Ser Tyr Ser Asp Tyr Arg Val 
755 760 765 

Gly Cys Phe Lys Ser Gly Lys Tyr Lys He Val Leu Asp Ser Asp Asp 
770 775 780 

Gly Leu Phe Gly Gly Phe Asn Arg Leu Ser His Asp Ala Glu His Phe 
785 790 795 800 

Thr Phe Asp Gly Trp Tyr Asp Asn Arg Pro Arg Ser Phe Met Val Tyr 
805 810 815 

Ala Pro Ser Arg Thr Ala Val Val Tyr Ala Leu Val Glu Asp Glu Glu 
820 825 830 

Asn Glu Ala Glu Asn Glu Val Glu Ser Glu Val Lys Pro Ala Ser Gly 
835 840 845 
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Claims 

1. A nucleic acid sequence encoding a polypeptide having starch branching enzyme 
(SBE) activity, the encoded polypeptide comprising at least an effective portion of the 
amino acid sequence shown in Figure 4 or Figure 13. 

2. A nucleic acid sequence according to claim 1, comprising nucleotides 21-2531 of the 
nucleic acid sequence shown in Figure 4, or a functionally equivalent nucleotide sequence 
which hybridises under stringent hybridisation conditions with the nucleic acid sequence 
shown in Figure 4. 

3. A nucleic acid sequence according to claim 1, comprising nucleotides 131-2677 of the 
nucleic acid sequence shown in Figure 13, or a functionally equivalent sequence which 
hybridises under stringent hybridisation conditions with the nucleic acid sequence shown 
in Figure 13. 

4. A nucleic acid sequence according to any one of claims 1, 2 or 3 comprising a 5" 
and/or a 3* untranslated region. 

5. A nucleic acid sequence according to any one of the preceding claims, encoding a 
polypeptide having the amino acid sequence NSKH at about residue 697. 

6. A nucleic acid sequence comprising at least 200bp and exhibiting at least 88% 
sequence identity with the corresponding region of the DNA sequence shown in Figures 
4, 9, 10 or 13, operably linked in the sense or anti-sense orientation to a promoter 
operable in plants. 

7. A nucleic acid sequence according to claim 6, comprising at least 300-600bp. 

8. A sequence according to claim 6 or 7, comprising a 5*and/or 3' untranslated region. 



• 
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9. A sequence according to claim 8, comprising nucleotides 688-1044 of the sequence 
shown in Figure 9, and/or nucleotides 1507-1900 of the sequence shown in Figure 10. 

10. A sequence according to claim 6, comprising the nucleotide sequence shown in Figure 
10. 

11. A replicable nucleic acid construct comprising a nucleic acid sequence according to 
any one of the preceding claims. 

12. A polypeptide having SBE activity and comprising an effective portion of the amino 
acid sequence shown in Figure 4 or Figure 13. 

13. A polypeptide according to claim 12, in substantial isolation from other polypeptides. 

14. A polypeptide according to claim 12 or 13, having the amino acid sequence NSKH 
at about position 697. 

15. A method of modifying starch in vitro, the method comprising treating starch to be 
modified under suitable conditions with an effective amount of a polypeptide according to 
any one of claims 12, 13 or 14. 

16. A method of altering a plant host cell, the method comprising introducing into the cell 
a nucleic acid sequence comprising at least 200bp and exhibiting at least 88% sequence 
identity with the corresponding region of the DNA sequence shown in Figures 4, 9, 10 
or 13, operably linked in the sense or anti-sense orientation to a suitable promoter active 
in the host cell, and causing transcription of the introduced nucleotide sequence, said 
transcript and/or the translation product thereof being sufficient to interfere with the 
expression of a homologous gene naturally present in the host cell, which homologous 
gene encodes a polypeptide having SBE activity. 

17. A method according to claim 16, wherein the host cell is from a cassava, banana, 
potato, pea, tomato, maize, wheat, barley, oat, sweet potato or rice plant. 
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18. A method according to claim 16 or 17, comprising the introduction of one or more 
further nucleic acid sequences, operably linked in the sense or anti-sense orientation to a 
suitable promoter active in the host cell, and causing transcription of the one or more 
further nucleic acid sequences, said transcripts and/or translation products thereof being 
sufficient to interfere with the expression of homologous gene(s) present in the host cell. 

19. A method according to claim 18, wherein the one or more farther nucleic acid 
sequences interfere with the expression of a gene involved in starch biosynthesis. 

20. A method according to claim 18 or 19 t wherein the further nucleic acid sequence 
comprises at least part of an SBE I gene. 

21. A method according to claim 20, wherein the further nucleic acid sequence comprises 
at least part of the cassava SBE I gene. 

22. A method according to any one of claims 16-21, wherein the host cell is selected 
from one of the following: cassava, banana, potato, pea, tomato, maize, wheat, barley, 
oat, sweet potato or rice. 

23. A method according to any one of claims 16-22, wherein the altered host cell gives 
rise to starch having different properties compared to starch from an unaltered cell. 

24. A method according to any one of claims 16-23. further comprising the step of 
growing the altered host cell into a plant or plantlet. 

25. A method of obtaining starch having altered properties, comprising growing a plant 
from an altered host cell according to the method of claim 24, and extracting the starch 
therefrom. 

26. A plant or plant cell into which has been artificially introduced a nucleic acid 
sequence comprising at least 200bp and exhibiting at least 88% sequence identity with the 
corresponding region of the DNA sequence shown in Figures 4, 9, 10 or 13. operably 
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linked in the sense or anti-sense orientation to a promoter operable in plants* or the 
progeny thereof. 

27. A plant according to claim 24, altered by the method of any one of claims 16-22. 

28. Starch obtainable from an altered plant according to claim 26 or 27, having altered 
properties compared to starch extracted from an equivalent but unaltered plant. 

29. Starch obtained from an altered plant according to claim 26 or 27, having altered 
properties compared to starch extracted from an equivalent but unaltered plant. 

30. Starch according to claim 28 or 29 obtained from an altered plant selected from the 
group consisting of:- cassava, banana, potato, pea, tomato, maize, wheat, barley, oat, 
sweet potato and rice plants. 

31. Starch according to any one of claims 28, 29 or 30, having increased amylose content 
compared to starch extracted from an equivalent but unaltered plant. 
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Fig.2. 



130 



CM1 

TATCCATTCACAfCCATAAT*CClCtC*Cf ATACCCATTTCTTTTTTTTTf TTTtTCWTTTTTAAAAA*ACTTC**C*TCCAATtACTTCC;TC*CTTCrc*C*CyC V ' TlACTTCK 

A T ACCTAAC TCT ACC TA T T A TCC TCACTCA T A ?CCC TAAACAAA4AAAAACAAAAAC*AAAAAT TTTfTTCA JC TTC* ACC T ■ AA TCAACCCACTCAACACTC*CAC*CACA T I CAACAC 

«OL»*S*t T I Sl t S 

fcoi 

ACCCAAATCCCAC*CT<CACCArATCA&CAATACCTTTTCCrTC-C:7CCA CrCCCCAAATCTCAATCTlCCSCCTTCCATCC-CATCCAACCACCyc:TCTtCCCTT';CTTCA*CrTC " 
TCCCTT T AC CC f CTCATGTCCT ATACTCC TT A T£C AAAACCAAC ACCACCTCACGCCTTTACACTTACATCCCCCAACCT ACCAC TACC T TCCTCCACCACA4CCC** aC£AA£TTCAA£ 

nCMTT t sc i«fpcapi « <sosTcrHCO»«Tssctsr«ir 

AACAACCCCCCCT T T TC T ACCACCCTC TTC TC TCCAAACTC ATC TC A TCAA TCTCACTCCTCAAATCT AATCCTC ACTCC STC T AAAACACTCC^TCC TCATSCTCCCATTCAATCC TAT ^ 
TTCTTCCCCCCCAAAACATCCTCCCACAACACACCTTTCACTACACTACTTACACTCAXCACTTTACATTACCACTWCCCACATTTTCTtACCAACCACTArCACCCTAACTTACCATA 

KKAArsnB»rsc«ssHfsoss»i*«vTASftwc^oc«iecT 

TC T TC T TC AA C ACATC* A T TCCAACCCCCTCCC AC AC TT TC *CAAC* ATCC C ACCTCCTTACTCATCT * '.ACACTCTC A T T ATSCATCA f AACATTCTTCAACA TCAACT AAATAAACAA 
ACAACAACTTCTCTACTTAACCTTCCCCCACCCTCTC*AACTCTTCTTACCCTCCACCAATCACTACAACTCTCACACtAATACCTA£TATTCTAACAACTTCTACTTCATTTATfTCTT 

sssTootr aps * vsc* sovt t d ¥ t s l » « o o * i vroc *«*«r 

Xmn I Hfcnd " 

TCTCTTCCA>TCCCCCACACACTT ACCATCCCAAAAATTCXA?CTAA4CCAACCTCCAT^^ 
ACUCAACCTTACCCCCTCTCTCA*TCCTiCCCTTTT1*ACCT*CATTTCCM^ 

SVPRRCrvSI. c« ics«'«si''pcpco«i I D I 0 S I r c * « 

CAAQACCTACA T TACCGCTA T T C AC4CT AC AAAACAC TCCCACAACAAAT I CAC AACTATCAACCT ACTC T CCATCC AT T T TC TCCTCCC T ATCAAAACy ' TCC * T TC TCACCC ACTCAA 
QTTCTCCATCT AATCCCC AT AACTCTC ATCT T T TC TCACCC TC TTC T TT AAC TCT TCATAC TTCC A TCACACC T AC CT AAAACACC ACCCAT AC TTTTCAAAC C AAACACTCCCTC AC T T 

OHL0 Yftrso**ftiaCC tOKTfCSLOAf s«c»E«rcrs«SC 



AC ACCAA T AAC TTATACACA CTCCCCACC ACCACC TACCTCCCC TCC AT TCATTCCACAT T TC AATAAC TCCAA TCC T AATC^ ACATCTC ATCACTCACAATCAC t CTCC~CTCTCCCAC ^ 
JCTCCTT AT TCAATATCTCTC ACCCCTCCTCC TCCATCCACCCCACCT AAC T AACC TC TAAAC1 T Al TCACC T f ACCAt TACCTCTAC ACT AC TCACTC * 7 AC T C AC ACCAC ACACCC T C 
TC , TTSCVAPCATWAAL t COfHWWWPHAOV W'O WtCCtWf 

Heat xnoi 

ATCTTTTTCCCCAAT AATCC ACvATCCT'C ACC *CC AA T TCCCC A TCJT'C T TCACT AAACAT4CCC ATCCAT AC TCC A TC TCCC*AC AAACAT TC TAT TCC*Cr"CCA?C AACTTC TC A 
TACAAJiAJlCCCCTTATTACCTCTACCAACtCCTCXT TAAC^SCT ACC AACACCTCATTTCTaTCCCTACCTATCACCT ACACCCTTCTTTC TA^CATAACCACCAACCTACTTCAACACT 
IfL^WKAOCSPP !PmCS«»« t»HDTPSC*«OS !*»Aw!«rs 



CTTC AACCACC *CCTCAAC TC CC ATAT AATCCC A T A T*C T A TCATCC TCCCCACCACCACAACTATCTCT TC 4A AAATCC TCAftCCAAACAfcACCAAAATC *C TTCSCATT * ATCACTCC 
^^^gftCCTCCTCCACTTCACCCTATATTACCCTATATCATACTACCACCCCTCCTCCTCTTCATACACAACTTTTTACCACTCCCTTTCTCTCCTTT T ACTCAACCCT AAA T ACTC ACC 



toto 



CACCTTCXAATCACTACTACCC ACCCACTAATTA4CACATATCCCAACTTTACJtC^TCATCTCCTTCCTCCC*TCAAAAACCTTCC^TACAATCCTCTTCACCTCATCCCTATTCAACAC ^ 
CTCCAACCTTACTCAtCATCCCTCCCTCATTAATTCTCTATACCCTTCAAATCTCTACTACACCAACCACCCTACTTTTTCCAACCCATCTTACCACAACTCCACTACCCATAA&TTCTC 
MVCnSSTCPv t N T T AMr«00»lP*>*«LCT*lAVOLfAlCi: 



CATTCATATTATCCTACTTTTCCCTATCACC TCACAAACTTTTATCCACCTACCACCCCATTTCCAACTCCTCATCATTTAAACTCCCTACTASATAAA&C^C*CCACTTA£C'CT?CTT 
CTAACTATAATACCATCAAA*CCCATACTCCACTCTTTC*JA*r*CCTCCATCCTCCCCTAAACCTTCACCACTACT*A*tTTCACCCArCATCTATTTCCACTCCTCAA'CCACAACAA 
M5TTAS rCTHVTNfT*ASS«fCT<»CDL«SL*Ol£A»*etCL(. 

Nail 

QTTCTC ATCCATAT TC TTCATAuCCATCCATC*AC T AATACCT TCCATCCCCTCAAf ATCTTTCATCCTACCCATCCTCAC TAC TTTCAC TCTCCACCACCCSS" ATCAT TCCAtCTCC 
CAACACTACC TAT AAC AACTATCCCTACCT ACT TCA TTATCC AACC T ACCCCAC TTAT AC AAACTACC ATCCC TACC AC TCATCAAACTC^CACCTCC'SCCCCACTACTAACC TACACC 
» Lf , 0|W M$MAS?WTLOCtNI«faCTDCMTrHSCP»CKM¥WW 

CACTCTCCCCTTTTCAACTATCCCACCTCCCACCTTCTAACCTTT*TCT7 TCAAATACAACCTCCTCCTTCCATCACTACAACTTTCATCCCTTCACATTTSATSCCCTCACTTCAA?C 
CTCAC*CCCCAAAACTTCATACCCTCCACCCTCCAACATTCCA*ACAACA4ACTTTATCTTCCACCACC»ACCT*C:C*TCTTCAAACIACCCAASTCf AAACTACCCCACTCAACTT4C 

03RL rMTC$wt ¥L or L L5«»T»wwLOC»<rocr«rocvTS« 



1230 
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Fig. 2 (Cont). 



ATCT*CKCCAlC*TCCATTCC*CCT*C*TTTCKCCCCA*CTlC**fC *ATlCTTTCC*TATCCAlCICATCTACATCCTCTCCTTTATCTCATCCT6TTClATCAt*rSATTC*T^T 

TACATGTCCCT ACT ACC TA ACCTCC ATC T AAACTCCCCCTf CA TCT?AC* T ATCAAACC TAT ACCTTCAC TACATC T ACCACACC A aaTACAC TACCAC AACTT ACTATACTAAC JACC A 
MTTMMCCOVOr TCHfWC rrCTAT0vOA*»TL"Lti«O«I mC 

CTCTTCCCACACCCTCTC*CC*TTCCTC AACATCtTACTCCAATCCCAACiCTT?CC*TTCCCCTTCAACATCCTCCTCTTCCCTTTCA7TATC. f CCACAICCCtXTTCCTCAT 
CACAACCCTClCCCACACTCCTAACCACTTCT*CAAfCACCTTACCCTTCTCA*ACCTaACCCCAACTTCI*CCACCAC*ACCCAAACTfcAlaCC*CiC C TCTACCCACAAC£ACTa 
L f * C A ¥ T iCCOVSCn*»T*C l^»C0CC»ir3T»LH#l*»A0« 



• 610 



AAA 

,T?T 



I 

TCCCTT CACAT T ATTC ACA AG ACACA1CAACATTCCAAAATCCCTCACA*TCTaCATA TCC TCACtAACACCCCCTCCTyC&AAAACTCTCTTTCTTATCCTCAAA4TCAT£ACCASfiCC 
ACCCAAtTCTAATAACTCTTCTCTCTACTTCTAACCTTTTACCCACTCTAACATCTATACCACTCCTTCTCCCCCACCAACCTTTTCACACAAACAATACCACTTTCA^TACTCCTCCSfi 
wytr I Q <R0fOW««CCi»MHLTN«»Wtt«C»$TAtSMO0A 

CTTCTTCCTCAC AAAAC T ATtCC ATT TTCCC *CATCCACAACCA TA tC TATCAC T TC ATCCOCCTCAC ACACCA TCTAC TCC TCTTATACATCCTCCAATACCATTCCACAAAATCATC 
CA^AAXCACTCTTTTCATAACCfAAAACCCACTACCTCTTCCTATACAfACTCAACTACCCACCACfCTCTCCrACAfCACCACAATATCTACCACCTTAfCCTAACCTCTTTTACTAC 

t¥C0K t lAfwi nO«n*»Tor«APo«*»S T *»L io»c:aihk^i 

NCOI 

flC H CTTAT TAc'cA TCCCC t tacccccacaacca t at t tcaa t t t t a tcccaaa tca atttccac a tcc ^cactcca ; TQA T T? *cc aacaccccatccaca tc TCCCt AATCCTAAACt A 

TCCCAATAATCCTACCCCAATCCCCCTCTTCCTAfAAACTTAAAArACCCTTTACTTAAACCTCrACCACTCACCTAAClAAAACCTTCTCCCCTACCtCtACArCCCTrACCATTTCA? 

EcoW v 

j^tTCCftCCfl<**CAACCAC AC T TATCAT AAATCCCCTCCTACATTTCAFCTACCTCATCC ACACTA TCTAACATATCATCCAATCC AACACTTTCATCACCt AATCCAACATCTTCAACAA 
f AAftCTCCCTTCTTCCTCTCAATACTAI TTACCCt ACC*TCTAAAC TACAtCC AC TACC*CTCA TACAITC TATACTACC T T aCCTTCTCAAAC t ACTCCCTTACCTTCT ACAACTTCTT 

i sfOt c *»«rDicoAOTi«»NC«QcrooA*aHL re 

^^y^y^yyYg4TQ4CTTCTCACCAXCAC TATATAICACCCAACCATCAACCACATCCCATCAT TCTCTTTCACACCC&AAACC T TCTTTTTCTATTCAACTTTCATTCCAC T AACACC 
CCCATAXCAJUCTACTCAACACTCCTCCTCATATATACT«CTTCCTACTTCCTCTACCCTACTAA£AMAACTCTCCeCTTTCCAA^ 
ATCFHTSCHO' I $ft «OCCO«l i v f t * > C * L 1 f * * * f M * 1 U S 

Y^Yjg^gjIYj^ggQjfcCTTCCCTCCTTC AACTCACCAAACTACAACATTCTTTTCCACrCCCATCATCCCTTCT TTCCACCC T AACACCC TTACTCATCATCCCCACf AC TTC ACCTTT 
ATA^TCTAJklCCCTCAACCCACCAACTTCACTCCTTTCATCTTCTAACAAAACCICACCCTACTACCCAACAAACCTCCCAACTTCTCCCAATC^^ 

TSOttvCCr* S C * t« I VLOSOOCl rc cr***t S HO A.€Hrrr 

CACCCftTCCTATCATAACCCCCC TCCCTCCTTCATCCfATATCCACW 

CTCCCCACCATACTATTCCCCCCACCCACCAACTACCATATACCTCCTACATCCTCTCCTCACCACCTAfCAAAtCATCTTCTACTTCTCTTACTTCCTCTCTTACTTCA^CTTTCACTT 



0 c 



CSC 



BamH i MncH 

gy ^^^jfc g^^^gygCCCCTCACATACATAT TTAC ? AACACCAtCCCC T AAACCACCAATCCTTAACCTCTCCATC TCC ATTCAACCACCTATATTCACACT TCAATTCATT TCC TCC TC A 
CACTTTCCTCCCACCCCCAC TCTATCTAT AA ATCATTCTCCTACCCCATTTCCTCCTTACC AAfTCCACACCTACACCtAAC fCC^CCAT ATAACTCTCAACTTAACTAAACCACCAC* 
¥ K P A S C 

S«pl Wl N(MI 

^ ^*^^4 ^ACAAt'aTTAATTCC AACCCTC AACCCACACAT AC ACCCC A T A ATCC ATCATC ATA TCAAASCTCCCCAAC T TQT AAATC AT TTACC AA&CTCCCTCCACTCTCT 4 AA T T A T ATC 
CCTCTCTCTTATAATTAACCTTCCCACTTCCCTCTCTATCTCCCCT»TTACCTACTACtATACTTTCCACCCCTTCAACATTTACTAAATCCTTCCACCCACCTSACACATT*AATATAC 

Sea I nco t 

TACTACTTTCCCAACTCACCTTATTAT CCATACCATCCATCTCCCCr*CCAAAAATTT?C:t:TATACCCCTACTACCATTTTTAAATCTCCCATCTTC;ACATAAACT^r:c'TCAA'a 
ATCATCAAACCCTTCACTCCAATAATACCTATCCTACCTACACCCCATCCTTTTTAAAACACATAtCCCCATCATCCTlAAAATT?ACACCCTACAACC?CTATT?CACCACCAACtTAC 

rTCCCCCACTAtTTTTCACTAAAATCATTCAACTTATTCTTCACTTCCCCCICTCAAAAAAAAAAAAAAAAAAA 

aacccCCTCataaaaactcattttactaacttcaataacaactcaacccccacacttttttttttttttttttt 
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Fig.4. 



eTctcfWTT(fc*cccA**rcccACKf*tucAriicicc^ 

CACACATTCAAC" .;CTTfACCCTCtCArCTCCTAlACtCCTT4lCf AAAACCAACACCACCTCACir*;- . ACACt IACA TCCCCCAACCT ACCCA TACCC TCC ?CC*CCACA4CCC 

n c h ▼ t ib&i*^*»CAp|. ctioSTcr»*c*«»Tssc 

T T TCC T TC AAC T TC A ACCACCC &t T T TC T ACCACCC TC T TC TC TCCA AAC TC AICTC ATCAATC TCAC TCC TC AAA I CT AATCCTC AC TCC T TC T AAAACACTCC T TCC TCATCCTCCCA ^ 
AAACCAACTTCAACTTCCTCCCCAAAACATCCT£CCACAACACACCrT«CACTACACTACTTACACTCACXACTTTACAlTACCAClCACCAACArTrTCTCACCAACCACTACCACCCT 



srMF«CArs«»*rsc*S5MES0SSN** #TAS<»* 



P 0 C » 



TTCAATCCTATTCTTCTTCAAC ACATC AA ' TCCAACCCCC TCCCACACTTTCACAACAATCCC ACCTCCTTACTCATCT TCACACTCTCaTTATCCATCAT AACAT T CTTCAACATCAAC 
AACTTACCATAACAACAACTTCTCTACTTAACCTTCCCCCACCCTCTCAAACTCTTCTTACCCTCCACCAATCACTACAACTCTCACACTAATACCTACTATTCTAACAACTTCTACTTC 



ICCTSSSTOOL.CArCTvSCCSO*! T 0 * C $ i 



mod* ivcoe 



Xmnl 

TAAATAAACAATCTCTTCCAATCCC CGACACACTlACCATCACAAAAATTC<UTCTAAACCAACCTCCATTCCTCCACCCCCCACACCCCAAACA*:ATATCACATACArCC*ACCTTCA ^ 
ATTTATTTCTTACACAACCTIACCCCCTCTCTCAATCCTACTCTTTTTAACCTACAITTCCTTCCACCTAACCACCTCCCCCCTCTCCCSTrTCTriTATACTCTATCIACCTTCCAACt 



en 



6O0 



CACCCTTTCCTCAACACCTACATTACCCC TATTCACACTACAAAACACTCCCACAACAAATTCACAACrAT(UACCTACtCTCCATCCArTTTCKCrcc:T4TCAAAACTTTCCTTTCT 
CTCCCAAACCACTTCtCCATCTAATCCCCATAACTCTCATCITTTCTCACCCTCTTCTTFAACICTTCATACTTCCArCACACCTACCTAAAACACCACCCATACTTTTCAAACCAAACA 

Tcr«o"Mto*«*soT««i«CCto«TrcsiOArs»c»c*rcr 
Cacccactcaaac accaataac t t at acacactccccacc accacc t acctccxctccat TCAT TCCACATTTC a* t aaC tccaatcct aa tcc *ca tctc a tcac tc acaa tcactctc 

CTCCCTCACTTTCTCCTTATTCAATATCTCTCACCCCTCCTCClCCArCCACCCCACCTAACTAACCTCTAAACTTATTCACCTTACCATTACCTCTACACTACTCACTCTTACTCACAt 
SBSCTC irtl»fWAPCATwAALtC0 VNIiWNfWAO*»»TONrC 

B0U Ncol x*>t 

CTCTCTCCCACATC TT T T TGCCCAATAA TCC AC A TCC"C ACCACC AATTCCCCATCCTTC TCCAC T AA ACAt ACCC A TCJA T AC TCC A TC TCCC AACAAACATTCTATTCC TCC TTCCA ^ 
CACACACCCTCTACAAAAACCCCTTATTACCTCTACCAACTCCTCCT'AACCCCTACCAACACCTCATTTCTA^CCCTACCTATCACCTACACCC — ^TrTCTAACATAACCACCAACCT 

C¥Wt: r L p NM *ocsPP »Mcr>Pv<i»-n;rs-^».^csiPAw 

TCAACTTCTCACTTCAACCACCAC CTCAACTCCCATATAATCCCATAtarTATCATCCTCCCCACCACCACAACTArCTCTTCAAAAATCCTCACCCAAACACACCAAAATCACTTCCCA ^ 
ACTTCAACACTCAACTTCCTCCTCCACTTCACCCTATATTACCCTATATCAIACIACCACCCCTCCTCCTCTTCATACACAACTTTTTACCACTCCCTT^CTCTCCTTTTACTCAACCCT 



|KFSVQAPCllPTMCIT*OPPtfCt»*F«»«» > C' , <» p 



K S L » 



TTTATCACTCCCACCTTCCAATCACTACTAC CCACCCACIAATTAACACArATCCCAACTTTACACATCATCTCCTTCCTCCCAlCAAAAACCTTCCCTACAATCCTCTTCACCTCATCC 

aaatactcaccctccaaccttactcatca:ccctccctcattaattc-cta?acccttcaaaictctactacaccaaccaccct»ctttttccaacccatcttaccacaactccactacc 

, Tt SM»C«SSTC*»»l»«T*A»*rRDO»l f> Q I < * L CTMAVOL* 
CTATTCAACACCATTCATATTATCCTACTrrTCCCTATCACCTCACAAACTT'TATCCACCTACCACCCCATTTCCAACTCCTCATCATTTAAACrCTCTAATACATAAACCTCACCACr ^ 
CATAACTTCTCCTAACTATAATACCATCAAAACCCATACTCCACTCTTTCAAAATACCTCCATCCTCCCCTAAACCrTCACCACTACMAATTTCiCiCATTATCTATTTCCACTCCTCA 

4 , QE HS**As r = 'H** , » rT **ss«rc*PO0L< sl iocahC 

TACCTCTTCTTCTTCTCATCCATAT TCTTCATACCCATCCATCAACTAA'ACCTTCCATCCCCTC^ x TCT T TCATCCT ACCCAT;CTC*C T AC T T T C AC TC TCCACC ACCCCCTCATC 
ATCCACAACAACAACA4lACCTAlAACAACTATCCCTACCTACT*CA|-ATCCAACCTACCCCiC"*TACAAACTACCATCCC* - CC ACTCA7CAA*CTCASACC TGCTCCCCCACTAC 
LCLL»L«0: /HS^AS'^*tOC ..Mrif OC-DCH T fHSCP«CM 
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Fig.4 (Cont). 

ATTCCATC • .; AC TCTCCCCTT TTCAACT*TCCCACCTCCCACCTTCTAACCTTW ^ 
TAACCTAC ACCCTCACACCCCAAAACTTCATACCCTCCACCC TCCAACATTCCAAACAAC* * ACTTTACCTTCCACCACCAACCTACTCA7CTTC AAACT ACCCAACTCTAAACTACCCC ?• 

MWBWOSRl .rwTCSw€*LRrLLS*)ABtfULOET«rocF»roc 

TCACT — aTCATCTACACCCAKATCC aTTCCACCTACATTTTACCCCC^^ |MQ 
ACTCAACT 1 AC T »c ATCTCCCTACTACC T AAC3TCCATCT AAAATCCCCCTTCATCTTACT TATCAAACC TATACCTTCACTACATC TACGAC ACCAAATAAACTACCACAACTTACTAT 

V TS«HTTMHCL0V0rTCNTI*ETrCTATO»OA*VTLHLL»*O 

TCATTCATCCTCTCTTCCCACAC CCTCTCACCATTCCTtUACATCTTACTgCAATCCCAACACTTTCCATTCCCCTTCAACATCCTW |e#() 
ACT AAfiTACCACACAAGCGTC TCCCACACTCCTAACC ACTTCTACAATCACCTTACCCTTCTC AAACCTAACCCCAACTTCTACCACCACAACCiAAACTAATACCACACCTCTACCCAC 

M|MCtrpeA VTlCE0VSCHPTVCIP»£OCCWCF0YBt«riA 

Y YQ^ Yqj^Y^^^TQCCTTCA CAT TATTC ACA ACACACATCAACaT TCCAAAATCCCTCACAT TCTACATaTCC TCACCAACACCCCCTCCrTCCAAAACTSTCTTTC TTATCCTCAAACTC ^ 
AA CQ^Cf A X TTACCCAACTC T AATAACTCTTCTCTC TACTTCTAACCTTTTACCCAC TCTAACATCTA? ACCACTCCTTCTCCCCCACCAACC TTTTCACACAAACAATACCACTTTCAC 
y .QKWVe I I OtBOfOWltHCOIVMML TMBBWc€«C»3»AES 

aTCACCA CCCCCTTGTTCCTCAC AAAACTATTCCAT TT TCCC TQATCCAC AACCATATgTATQAC TTC A TCCCTCTTCACACACC A TC T AC T CC TC TCAT ACATCCTCCACT ACCATTCC ^ 
TACTCCTCCCCCAAC AACCAC TCTTTTCAT AACCTAAAACCCACTACCTCTTCCTATACAT AC TCAACTACCCACAACtCTCTCCTACATCACCACACTATCTACCACCTCATCCTAACC 

HOOALVCCKT | a WLHD«0«*OrnALO«PSTrLlO*CVAL 

acaaaatLtcacccttattaccatcccatta(xcccacaaccatattt w tttt ^ 

TCTTTTACTACTCCCAATAATCCTACCC^ 

M « H t R L i TUCLCCECTLHrHCMCrCHPEVlOrPftCOLMLP 

EcaR V Bdl 

CTCCTAAATTTCTTCCTCCCAACAATTAC^^^ 2(fi0 
CACCATTTAAACAACCACCCTTCTTAATCTC AAT AC TATTT ACCCCCCC ATCCAAAC TACATCCCTT AACTTTCCTACACTC TATACt ACC TT*CCT TC TC AAAC TACTTCCT TAACTCC 

sc«r»PCHHTSTO«c«i»«FOLCN-5BHiRfHcrocroOAio 

ATCTTCAACAACCCTATCCTTTCATCACTTCTCA^ ^ 
^Hgjm^f XCTTCCCATACC AAAC TACT CAACACTCCTCCTT ATCT A TACT CCC TTC C TACT T TCCCTACCCTACTAACACAACCTCTCCCCTTTGSACCAAAAACATAACTTAAAACTAA 

„ LEEA TCr«TSEMOrlS»KOEROBI IvrEBCKlwF*'"'" 

CCA£TACCACCTATTCCCATTACCCACTTCCCTCCTTAAACCCACCAAACTACAA6ATACTCTTCCATTCACATW ^ 
CCTWTC6T ^ CATAAWCTAATCCCTCAACCCACCJU TTTCCCTCCTT 

W TSSTSOTBVCCL«PCCT*lVLOSO0PLrCCr C «LSMO4E 

ACTTCACCT^TCAACCCTCCTACCATAAC^ 25TO 
TCAACTCCUAACTTCCCACCATCCTATTCCCCCCA^ 

M rsrECWTONRPBSrnvTTPCRT»vv TAL < E 0 E v F N E L E * 

T CCCCCCTTAACATATATCTTAACAACACCTTCTCAACCACCAATCCCATTATTCATCTTCCTATCTT ^ 
ACCCCCCAATTCTATATACAATTCTTCTCCAACACTTCCTCCTTACCCTAATAACTACAACCATACAA 
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Fig. 5. 



#60 #-70 #80 #90 #100 #110 #120 

125 ♦91 seq TAGTTTTCGGTACCA7GTC AC AAACTTTTTTCCACC TACCAGCCGAT7TGGAAC TCC TGATCATTTCAAG 
TAGTTTTGCGTA CA GTCACAAACTTTT TCCA C T AGCACCCCATTTGGAACTCC TCATGATTT AAG 
1 16 seq TAGTTTTCGGTATCACGTC ACAAAC TTTTATGC AGCTAGCAGCCGAT TTGGAACTCCTGATGATTTAAAG 

MHO M150 *1160 ^1 170 M180 M190 *1200 

#130 #140 #150 #160 #170 #180 #190 

125-94 seq TCTTT A ATAGAT AAAGCTCATGAGTTAGGGCTCCTTGTTCTCATGGATATTGTTC ATAGCCATGCCTCAA 
TCT TA A T AGAT AAAGCTC A GAGTTAGG CT C T TGTTCTCATGGATATTGTTCAT AGCCATGC TCAA 

I 16 siq TCTCT AAT AGATAAAGC TCACGAGTTAGGTCT TCTTGTTCTC ATGGATATTGTTCATAGCCATGCATCAA 

M210 M220 *1230 M240 *1250 *1260 *1270 

#200 #210 #220 #230 #240 #250 #260 

125*94 seq ATAATACGTTGGATGGGCTGAACaTGTTTGATGGTACGGATAGTCACTACTTCCACTCCGGATCACGGGG 
TAATACGTTGGATGGGCTGAA ATGTTTGATGGTACGGAT GTCACTACTT CACTC GGA CACGGGG 

1 16 seq CTAATACGTTGGATGGGCTGAATATGTTTGATGGTACGGATGGTCACTACTTTCACTCTGGACCACGGGG 
M280 M290 M300 *1310 M320 M330 *1340 

#270 «-280 #290 «-300 #-310 #320 #-330 

125-94 seq TC ATCATTGGTTGTGGGAC TCTCGCCTTTTC AACTATGGAAGCTGGGAGGTGCTAAGATTTC f TCTTTCA 
TCATCATTGG TGTCGGACTC CGCCTTTTCAACTATGG AGCTGGGAGGT CTAAG TTTCTTCTTTCA 

1 16 seq TCATCATTGGATGTGGGACTCCCGCCTTTTCAACTATGGGAGCTGGGAGGTTCTAAGGTTTCTTCTTTCA 
M350 M360 *1370 M380 M390 *-1400 *1410 

#340 #350 #360 #370 #380 #390 #400 

125-94 seq AATGCAAGATGGTGGTTGGAAGAGTACACGTTTGATGGTTTTAGATTTGATGGGGTGACTTCCATGATGT 
AATGCAAG TGGTGGTTGGA GAGTACA GTTTGATGG TT AGATTTGA GGGGTGACTTC ATGATGT 

1 16 seq AATGCAAGGTGGTGGTTGGATGAGTACAAGTTTGATGGGTTCAGATTTGACGGGGTGACTTCAATGATGT 
M420 *1430 *1440 *1450 M460 M470 *1480 

#410 #420 #430 #440 #-450 <460 #-470 

125*94 seq AC AC TCC CC A TGGGTTGCAGGTAGCTTTTACTGGC A ACT AC AATGAGT AC TTTGGATATGCAACTGATGT 
ACAC C CATGG TTGCAGGTAG TTTTAC GGCAACTACAATGA TACTTTGGATATGCAACTGATGT 

1 16 seq ' AC ACCC ATC ATGGA TTGC AGGT AG ATT TTACC GGCAACTACAATGA A TACTTTGGATATGCAACTGATGT 
*1490 M500 M510 *-1520 M530 M540 M550 

#480 #-490 #500 #510 #520 #530 #540 

125*94 seq AGATGCTGTGATTTATTTGATGCTTGTGAATGATATJGATTCACGGTCTTTTCCCTGAGCCTGTTACCATT 
AGATGCTGTG TTTATTTGATGCT TGAATGATATGATTCA GGTCT TTCCC GAGGCTGT ACCATT 

1 16 seq AGATGCTGTGGTTTATTTGATGCTGTTGAATGATATGATTCATGGTCTCTTCCCAGAGGCTGTC ACCATT 
*1560 <-1570 M580 M590 *1600 *1610 ^1620 

#550 #560 #570 #580 #590 #600 #6 1 0 

125*94 seq GGTGAAGATGTTAGCGGAAAGCCAACATTTTGCATTCCAGTGGAAGATGGTGGTGTTGGATTTGATTACC 
GGTGAAGATGTTAG GGAA GCCAACA TTTGCATTCC GT GAAGATGGTGGTGTTGG TTTGATTA C 

1 16 seq GGTGAAGATGTTAGTGGAATGCCAACAGTTTGCATTCCGGTTGAAGATGGTGGTGTTGGCTTTGATTATC 
*1630 M640 M650 M660 «-1670 M680 M690 

#620 #630 #640 #650 #660 #570 #580 

125-94 seq GTCTCCACATGGCCATTGCCGATAAATGGATTGAGATTCTTAAGAAGAGAGATGAGGACTGGAAAATGGG 
GTCTCCACATGGC TTGC GATAAATGG TTGAGATT TT AGAAGAGAGATGA GA TGGAAAATGGG 

1 16 seq GTCTCCACATGGCTGTTGCTGATAAATGGGTTGAGATTATTCAGAAGAGAGATGAAGATTGGAAAATGGG 
*1700 M710 M720 M730 «-1740 M750 M760 

#690 #700 #710 «-720 #730 #740 #-750 

125*94 seq TGACATTGTGCATACACTCACCAACAGAAGGTGGTTGGAAAAATGTGTTGCTTATGCTGAAAGTCATGAC 
TGACATTGT CATA CT ACCA :AG GGTGGTTGGAAAA TGTGTT CTTATGCTGAAAGTCATGAC 

1 16 seq TCACATTGTACATATGCTGACCAACAGGCGGTGGTTGGAAAAGTGTGTTTCTTATGCTGAAAGTCATGAC 
*1770 M780 M790 *-1800 *1810 M820 *1830 

rlSO #770 4-780 #790 #800 #810 #-820 

125-94 seq CAAGCTCTTGTTGGTGACAAAACTATTGCATTTTGGCTGATGGACAAGGACATGTACGACTTCATGGCTC 
CA GC CTTGTTGGTGACAAAACTATTGCATTTTGGCTGATGGACAAGGA ATGTA GACTTCATGGCTC 

1 16 seq CAGGCCCTTGTTGGTGACAAAACTATTGCATTTTGGCTGATGGACAAGGATATGTATGACTTCATGGCTC 
M84C M850 M860 M870 M880 «-1890 *-1900 

#830 #840 #850 #860 #870 #880 «-890 

125*94 seq GTGACAGACCATCTACTCCTCTTATAGATCGTGGAATAGCATTGCACAAAATGATCAGGCTTATTACC AT 
TGACA^ACCATCTAC CCTCT ATAGATCGTGGA TAGCATTGCACAAAATGATC AGGCTTATTACC AT 

1 16 seq TTGAC A JACCATCTACCCCTCTCATAGATCGTGGAGTAGCATTGC ACAAAATGATCAGGCTTATTACCAT 

*1910 *1920 M930 M940 M950 M960 M970 

#900 #910 #920 #930 #940 #950 *960 

125*94 seq GGGCTTAGGCGGAGAAGGATATTTGAATTTTATCGGAAATGAATTTGGAC ATCCTGAGTGCATTGATTTT 

GGG TTAGGCGGAGAAGGATATTTGAATTTTATGGGAAATGAATTTGGACA CC GAGTGGATTGATTTT 
1 16 seq GGGATTAGGCGGAGAAGGATATTTGAATTTTATGGGAAATGAATTTGGACACCCCGAGTGGATTGATTTT 
M980 M 990 *2000 *2010 *-2020 *2030 *2040 
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Fig.5 (Cont). 

4-970 <980 «*990 4*1000 4*1010 4*1020 4*1030 

125*94. seq CCAAGAGGGGATCGACATCTGCCCAATGGTAAAGTAATTCCAGGGAACAACCACAGTTATGATAAATGCC 

CCAAGAGG GATC ACATCT CCCA TGGTAAA T TTCC GGGAACAA ACAGTTATGATAAATGCC 
1 16. seq CCAAGAGGTG AT CTACATCTTCCCAGTGGT A A ATT TGTTCCTGGG A ACAATT ACAGTTATGATAAATGCC 

♦-2050 *2060 *2070 *2080 *2090 *2100 *2110 

4*1040 4-1050 4-1060 4*1070 4-1080 4-1090 4*1100 

125*94. seq GTCGTAGATTTGATCTAGGTGATGCAGACTATCTAAGATATCATGGAATGCAAGAGTTTGATCAGGCAAT 

G CGTAG TTTGATCTAGG AT CA A ATCT AGAT ATCATGGAATGCAACAGTTTGATCA GCAAT 
1 16 seq GGCGTAGGTTTGATCTAGGCAATTCAAAGCATCTGAGATATCATGGAATGCAAGAGTTTGATCAAGCAAT 

*2120 **2130 «-2140 <-2150 *-2160 *2170 *2180 

4-1110 4*1120 4-1130 4-1140 4-1150 4-1160 rl 170 

125-94. seq GCAACATCTTGAAGAAGCCTATGGTTTCATGACTTCTGAGCACCAGTATATATCACGGAAGGATGAAGGA 

CA CATCTTGAAGAAGCCTATGGTTTCATGACTTCTGAGCACCA TA ATATCACGGAAGGATGAA G 
1 16. seq TCAGCATCTTGAAGAAGCCTATGGTTTCATGACTTCTGAGCACCAATACATATCACGGAAGGATGAAAGG 

*2190 *2200 ^2210 «*2220 *2230 *2240 *2250 

4-1180 4-1190 4-1200 4-1210 4*1220 «-1230 «-1240 

125*94 seq GATCGGATCATTGTCTTTGAGAGGGGAAACCTTGTTTTTGTATTCAACTTTCATTGGACTAACAGCTATT 

GATCGGATCATTGTCTT GAGAGGGGAAACCT GTTTTTGTATTCAA TTTCATTGGACTA CAGCTATT 
1 16 seq GATCGGATCATTGTCTTCGAGAGGGGAAACCTCGTTTTTGTATTCAATTTTCATTGGACTAGCAGCTATT 

*2260 *2270 *2280 <*2290 *2300 *2310 *2320 

4*1250 4-1260 4-1270 4-1280 4-1290 4-1300 4-1310 

125*94. seq CAGATTACCGAGTTGGCTGCTTCAAGTCAGGAAAGTACAAGATTGTTTTGGACTCGGATGATGGCTTGTT 

C GATTACCGAGTTGGCTGCTT AAG CAGGAAAGTACAAGAT GT TTGGA TC GATGAT TTGTT 
1 16. seq CGGATTACCGAGTTGGCTGCTTAAAGCCAGGAAAGTACAAGATAGTCTTGGATTCAGATGATCCTTTGTT 

•-2330 *-2340 «*2350 *-2360 ^2370 *2380 *2390 

4-1320 4*1330 4-1340 4-1350 4-1360 4-1370 4-1380 

125*94 seq TGGAGGCTTCAACAGGCTTAGTCATGATGCCGAGCACTTCACCTTTGACGGGTGGTATGATAACCGGCCT 

TGGAGGCTT CAGGCTTAGTCATGATGC GAGCACTTC A CTTTGA GGGTGGTA GATAACCGGCCT 
1 16 seq TGGAGGCTTTGGCAGGCTTAGTCATGATGCAGAGCACTTCAGCTTTGAAGGGTGGTACGATAACCGGCCT 

*2400 *-2410 *2420 *-2430 *2440 *2450 *2460 

4-1390 4-1400 4-1410 4-1420 4-1430 4-1440 4*1450 

125*94. seq CGGTCCTTCATGGTATATGCACCATCTAGGACAGCACTGGTCCATGCTTTAGTAGAAGATGAAG 
CG TCCTTCATGGT TA CACCAT TAG ACAGCAGTGGTC ATGCTTTAGT GA GATGAAG 
1 16. seq CGATCCTTCATGGTGTACACACCATGTAGAACAGCAGTGGTCTATGCTTTAGTGGAGGATGAAG 

*2470 *2480 *-2490 «-2500 *2510 *-2520 **2530 
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Fig.6. 



rtO *20 ¥-30 «-40 ^50 *60 +10 

125-94 pro SFGYHVTNFFAPSSRFGTPDDLKSLIDKAHELGLLVLHDI VHSHASNNTLDGLNMFDGTDSHYFHSGSRG 
SFGYHVTNF: A: SSRFGTPDDLKSL 10KAHELGLLVLM0 I VHSHAS. NTLDGLNMFOGTD: HYFHSG: RG 
1 16 pro SFGYHVTNFYAASSRFGTPDDLKSL 1 DKAHELGLL VLMD I VHSHASTNTLDGLNMFDGTDGHYFHSGPRG 
«-370 *380 *390 *400 *410 M20 *430 

*80 *90 clOO ^1 10 «*120 c130 c140 

125-94 pro HHWLVDSRLFNYGSVEVLRFLLSNARWVLEEYRFDGFRFDGVTSMMYTPHGLOVAFTGNYNEYFGYATDV 
HHW WOSRLFNYGSWEVLRFLLSNARWWL: EY: FOGFRFDCVTSMMYT. HGLQV. FTGNYNEYFGYATDV 
1 16 pro HHWriWDSRLFNYGSWEVLRFLLSNARWWLOEYKFDGFRFOGVTSMflYTHHGLQVDFTGNYNEYFGYATOV 
*440 M50 *460 *470 *480 *490 *500 

«-150 «*160 *170 r180 «-190 *200 «-210 

125-94 pro DAV I YLMLVNDM I HGLFPEAVT I GEDVSGKPTFC 1 P VEOGGVGFDYRLHMA I ADKWI E I LKKRDEDWKMG 
DAV YLML: NOM I HGLFPEAVT I GEDVSG. PT C [PVEDGGVGFOYRLHMA: ADKW: EI: : KRDEDWKMG 
116 oro DAV V YLMLLNDM I HGLFPEAVT I GEDVSGMPTVC I PVEDGGVGFOYRLHMAVADKWVE I I OKRDEDWKMG 
P ^510 *520 *530 «-540 *550 *560 *570 

«-220 <-230 *240 «-250 <260 «-270 «-280 

125-94 pro D I VHTLTNRRWLEKC VAYAESHDQALVGOKT ! AFWLMOKDMYDFMARDRPSTPL I ORG I ALHKM I RL I TM 
01VH LTNRRWLEfCCV: YAESHDQALVGOKT 1 AFWLMDKDMYDFMA DRPSTPL I DRG: ALHKM I RL I TM 
116 oro D I VHMLTNRRWLEKCVS YAESHDQALVGOKT I AFWLMDKOM YDFMALDRPSTPL I ORG V ALHKM I RL I TM 
P ^580 *590 *600 *610 *620 *630 *640 

*290 +300 <310 r320 «-330 c340 c350 

125-94 oro G' GGEGYLNFMGNEFGHPEWIDFPRGDRHLPNGKV 1 PGNNHSYOKCRRRFDLGDADYLRYHGMOEFDOAM 
* GLGGEGYLNFMGNEFGHPEWI DFPRGO HLP: GK : PGNN. SYOKCRRRFOLG: : . . LRYHGMOEFDQA: 

1 16 oro GLGGEGYLNFMGNEFGHPEVIDFPRGDLHLPSGKFVPGNNYSYDKCRRRFDLGNSKHLRYHGMQEFOOAI 
p -1650 *660 *670 *680 *690 *700 V10 

«-360 *370 <380 r390 *400 <4 10 #«20 

125-94 pro QHLEEA YGFMTSEHQY I SRKDEGOR 1 1 VFERGNLVFVFNFHVTNSYSD YR VGCFKSGK YK I VLDSOOCLF 
QHLEEA YGFMTSEHQY I SRKOE OR 1 1 VFERGNL VFVFNFHWT: SYSOYRVGC: K: GKYK I VLOSOO LF 
116 pro QHLEEAYGFMTSEHQY I SRKOERDR I I VFERGNL VFVFNFHWTSSYSOYRVGCLKPGKYK I VLDSDDPLF 
p «-720 *730 4 740 *750 «760 *770 *780 

*430 c440 *450 *460 <470 

125-94 oro GGFNRLSHDAEHFTFDGWYDNRPRSFMVYAPSRTAVVHALVEDEENEAENEVE3 
GGF RLSHOAEHF: F: GWYDNRPRSFMVY: P. RTAVV. ALVEDE : : - : V. : 
116 pro GGFGRLSHDAEHFSFEGWYDNRPRSFMVYTPCRTAVVYAL VEDEVENEVEPVAG 
M *790 *800 *810 *820 *830 
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Fig. 9- ,*» : *» 

ATCCACAACCATATCTATCAC7TCATCCCTCTTCA CACACCATCTACTCCTCTCATACATCCTCCACTACCATTCCACAAAATCATCACCCTTATTACC* 
TACCTCTTCCTATACATACTCAACTACCCACAACTCTCTCCTACATCACCACACTATCTACCACCTCATCCTAACCTCTTTTACTACTCCCAATAATCCT 

nOKOflYOrMALORPSTPLIORCVALHKRl HLIT 

TCCCATTACCCCCACAACCATATTTCAATTTTATCCCAAATCAATTTCCACACCCCCACTCCATTCATTTTCCAACACCTCATCTACATCTTCCCACTCC 

i ■ i 1 1 1 ' ■ ■ i i i 200 

ACCCTAATCCCCCTCTTCCTATAAACTTAAAATACCCTTTACTTAAACCTCTCCCCCTCACCTAACTAAAACCTTCTCCACTACATCTACAACCCTCACC 

flCUCCECYLNrnCNEFGHPEWIOFPRCOLHLPSC 

EooR V ; Bcf f 

TAAATTTCTTCCTCCCAACAATTACACTTATCATAAATCCCCCCCTACCTTTCATCTACCCAATTCAAACCCTCTCACATATCATCCAATCCAACACTTT 

, t m , ) i | - i i - t 1 ' t t 300 

ATTTAAACAACCACCCTTCTTAATCTCAATACTATTTACCCCCCCATCCAAACTACATCCCTTAACTTTCCCACACTCTATACTACCTTACCTTCTCAAA 

KFVPCNNY5T0ICCRRRF0LCNSICR LRTHCH0EF 

CATCA ACCAATTCACCATCTTCAACAACCCTATCCTTTCATCACTTCTCACCACCAATACATATCACCCAACCATCAAACCSATCCCATCATTCTCTTCC 
CTACTTCCTTAACTCCTACAACTTCTTCCCATACCAAACTACTCAACAC7CCTCCTTATCTATACTCCCTTCCTACTTTCCCTACCCTACTAACACAACC 

OOA |OMLEEAYCFMTSEMQY!SRICD£ROR! I V F 

ACACCCCAAACCTCCTTTTTCTATTCAATTTTCATTCCACTACCACCTATTCCCATTACCCACTTCCCTCCTTAAAC^ 

TCTCCCCTTTCCACCAAAAACATAACTTAAAACTAACCTCATCCTCCATAACCCTAATCCCTCAACCCACCAATTTCCCTCCTTTCATCTTCTATCACAA 

ercnlvfvfnfhwt ssysoy'rvgclkpck Y K I VL 

CCAT TCACATCATCCTTTCTTTCCACCCTTTCCXAgCCTTAW 

CCTAACTCTACTACCAAACAAACCTCCCAAACCCTCCCAATCACTACTACCTCTCCTCAACTCCAAACTTCCCACCATCCTATTCCCCCCACCTACCA^ 
OSOOPL FCCFCRL SHOAEHF SFECVV O NRPRSF 
ATCCTCTACACACC ATCTACAACACCACTCCTCTATCCTTTACTCCACCATCAACTCCACAATCAACTCCAACCrCTCCCCCCTTAACATATATCTTACC 

taccacatctctcctacatcttctcctcaccacataccaaatcacctcctacttcacctcttacttcaccttccacaccccccaattctatatacaatcc 

MVYTPCRTAVVYALVEOEVENEVEPVAC . 

AACACCTTCTCA ACCACCAATCCCATTATTCATCTTCCTATCTCCATCTCCCTTCAACCAAATATATTCACCCTATAATTTCATCTCACC^ 
TTCTCCAACACTTCCTCCTTACCCTAATAACTACAACCATACACCTACACCCAACTTCCTTTATATAACTCCCATATTAAACTACACTCCCACCAACCTC 

ATTTCCATCCTCCT TCTTCCTATTTTCTTCTCATCATAAACATAATCAAACACCAATACCAAACCCACCCTTACATCCTACCTTCCATCATCATACCCAC 
TAAACCTACCACCAACAACCATAAAACAACACTACTATTTCTATTACTTTCTCCTTATCCTTTCCCTCCCAATCTACCATCCAACCTACTACTATCCCTC 

Sac I 

CTCACACCTCCTAAACCATAAA TCTTCAACCTCCCTCCCTTCCCTACTATCTTATCTCCTACTTTCCAATC7TAAATTATCATCATCCCTCTCCATCCTA 
CACTCTCCACCATTTCCTATTTACAACTTCCACCCACCCAACCCATCATACAATACACCATCAAACCTTACAATTTAATACTACTACCCACACCTACCAT 

ACTATCACAATTTTCTAfATATCCCAACCACCATTTTAACTTTTAAAAAAAAAACAAAAAAAATCCATC 

,i t • ' 1 1 1069 

TCATACTCTTAAAACATA7AT ACCCTTCCTCC T AAAATTC AAAATTTTTTTTTTCTTTTTTTTACCT AC 



100 
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Fig. 10. 



CU I Kpn I 

tatccattc* catccataataccactcactatacccatt::tttttttttttttttttct*cttttscctaccatctca>.aaacttttttccacctacca ^ 
atacctaactctacctattatcctcactcatatccctaaaaaaaaaaaaaaaaaaaaacatcaaaacccatcctacactctttc*aaaaacctccatcct 

SFCTHVTNrf.APS 

CCCCATTTC CAACTCCTCATCATTTCAACTCTTTAATACATAAACCTCATCACTTACCCCTCCTTCTTCTCATCCATAITCTTCATACCCATGCGTCAAA 
CCCCTAAACCTTCACCACTACTA*ACTTCACAAATTATCTATTTCCACTACTCAATCCCCACCAACAACACTACCTATAACAACTATCCSTACCCACTTT 

SRfC TP00LKSLl0iCAH£LCtLVLn0IVHSWASN 

TAATACCTTCCATCCCCTCA ACATCTTTCATCCTACCCATACTCACTACTTCCACTCCCCArCACCCCCTCATCATTCGTTCTCCCACTCTCCCCTTTTC 
ATTATCCAACCTACCCCACTTCTACAAACTACCATCCCTATCACTCATCAACCTCACCCCTACTCCCCCACTACTAACCAACACCCTCACACCCCAAAAC 

NTL 0CLNnFDCTCSMYFHSCSRCHMWLwOSRLP 

AACTATCCAACCTCCCACCTCC TAACATTTCTTCTTTCAAATCCAACATCCTCCTTCCAACACTACACCTTTCATCCTTTTACATTTCATCCSCTCACTT ^ 
TTCATACCTTCCACCCTCCACCATTC^AAACAACAAACTTTACCTTCTACCACCAACCTTCTCATCTCCAAACTACCAAAATCTAAACTACCCCACTCAA 

MYCSWEVLR-PtLSNARWWtEETR-roCFRrOCVT 

Ncol Seal 
ccatcatctacactcc ccatcccttccacctaccttttactcccaactacaatcactactttccatatccaactcatctacatcctctcat t TATTTCAT ^ 

CCTACTACATCTCACCCCTACCCAACCTCCATCCAAAATCACCCTTCATCTTACTCATCAAACCTATACCTTCACTACATCTACCACACrAAATAAACTA 

SnnYTPHCLOV AFTCNTNEtFCTA TOVOAV I Y L H 

CCTTCTCAATCA TATCAT TCACCCTC TTTTCCCTCACCCTCTTACCATTCCTCAACATCTT ACCCCAAACCCAACATTTTCC ATTCCACTCCAACATCCT ^ 
CCAACACTTACTATACTAACTCCCACAAAACCCACTCCCACAATCCTAACCACTTCTACAATCCCCTTTCCGTTCTAAAACC7AACCTCACCTTCTACCA 

L V N 0 1 I HC L F P £ A V T ICEOVSCKP T r C i P V E D C 

CCTCTTCCATTTC ATTACCCTCTCCACATCCCCATTCCCCATAAATCCATTCACATTCTTAAGAACACASATCAgCACTCCAAAATCGCTCACATTCTCC ^ 
CCACAACCTAAACTAATCGCAGACCTCTACCCCTAACCCCTATTTACCTAACTCTAACAATTCTTCTCTCTACTCCTGACCTTTTACCCACTGTAACACC 

CVCFOYRLMWAIAOKWIE ILKKRCEDVK*GD I V 
ATACACTCACCAACAGAACGTC CTTCGAAAAATCTCTTCCTTATCCTCAAACTCATGACCAACCTCTTGTTGCTCACA^AACTATTCCATTTTCGCTCAT ^ 
TATGTCACTCCTTCTCTTCCACCAACCTTTTTACACAACCAATACGACTTTCACTACTCCTTCCACAACAACCACTCTTTTCATAACGTAAAACCCACTA 

mtItnrrwlEkCvataESmOOalvco<t i * f w l * 

Bd I Nco t 

GCACAACGACATGT ACCACTTCATGCCTCGTGACAGACCATCIACTCCTCTTATAGATCCTGCAATAGCATTCCACAAAATGATCACCCTTATTACCATC ^ 
CCTCTTCCTCTACATCCTCAAGTACCCAGCACTCTCTCGTACATCACCACAATATCTACCACCTTATCCTAACCTCTTTTACTAGTCCCAAIAATCGTAC 

OKO«?OFha«OR»SIPLIORCIALHK«IRL I T n 

GCCTTACCCCCAGAAGC AT&TTTCAATTTTATCCCAAATCAATTTCCACATCCTCAGTGCATTGATTTTCCAACAGCCCATCCACATCTCCCCAATGGTA 
CCGAArCCCCCTCTTCCTATAAACTTAAAATACCCTTTACTTAAACCICTACCACTCACCTAACTAAAAGCTTCTCCCCTACCTCTACACCCCTTACCAT 

CLCCECtLMrnCMErcHPtwiOFPRCORHLPwC 
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Fig. 10 (Cont). 



EcoAV Ml 



AA CTAATT CCACCCAACAACCACACTTATCATAAATCCCCTCCTACATT^ 

TTCATTAACCTCCCTTCTTCCTCTCAATACTATTTACCCCACCATCTAAACrACATCCACTACCTCTCATACATTCTATACTACCTTACCTTCTCAAACT 



K V 



|PCNNHSYOKCR«*F0LCOAOYLRYWCnOEFD 



TCACCCAATCC AACATCTTCAACAACCCTATCCTTTCATCACTTCTCACCACCACTATAT ATCACCCAA^CATCAACCACATCC&ATCATTCTCTTTCAC ^ 
ACTCCCTTACCTTCTACAACTTCTTCCCATACCAAACTACTCAACACTCCTCCTCATATATACTCCCTTCCTACTTCCTCTACCCTACTAACACAAACTC 

OAnOHtEEArCFtlTSEMOYISRrOECORI I V F E 

ACCCCAAACCTTCT TTTTCTATTCAACTTTCATTCCACTAACACCTATTCACATTACCCUCTTCCCTCCTTCAACTCA 

TCCCCTTTCCAACAAAAACATAACTTCAAACTAACCTCATTCTCCATAACTCTAATCCCTCAACCCACCAACTTCACTCCTTTCATCTTCTAACAAAACC 
RCNL vrvFNFHWTNSYSDYRVCCFKSCK ft I V L 

ACTCCCATCATCCC TTCTTTCCACGCTTC^ l400 
TCACCCTACTACCCAACAAACCTCCCAACTTCTCCCAATCACTACTACCCCTCCTCAACTCCAAACTCCCCACCATACTATTCCCCCCACCCACCAACTA 

OSOOCL FCCF MRLSHOAEMFTFOCWYDNRPRSFH 

CCm ATGCACC ATCTACCACACCACTCCTCCATCCTT^ |$oo 
CCATATACCTCCTACATCCTCTCCTCACCACCTACCAAATCATCTTCTACTTCTCTTACTTCCTCTCTTACTTCATCTTTCACTTCACTTTCCTCCCACC 

VYAPSRTAVVHALVEOEENEAENEVCSEVKPAS 

BtfflH I : H*C« 

CC CTCACATACATATTTACTAACACCATCCCCTAAAC^ iaa) 
^gACTCTATCTATAAATCATTCTCCTACCCCATTTCCTCCTTACCAATTCCACACCTACACCTAACTTCCTCCATATAACTCTCAACTTAACTAAACCA 



Nsil 

CCTCACCACA CACAATATTAATTCCAACCCTCAAWCACACATACACCCCATAATCCATCATCATATCAAACCTCC^ |?00 
CCACTCCTCTCTCTTATAATTAACCTTCCCACTTCCCTCTCTATCTCCCCTATTACCTACTACTATACTTTCCACCCCTTCAACATTTACTAAATCCTTC 

Seal Ncol 

CTCCCTCCACTCTCTA AATTATATGTACTACTTTCCCAACTCACCTTATTATCCATACCATCCATCTCCCCTACCAAAAATTTTCTCTAT^ 
CACCCACCTCACACATTTAATATACATCATCAAACCCTTCACTCCAATAATACCTATCCTACCTACACCCCATCCTTTTTAAAACACATATCCCCATCAT 

Xmnl 

CCATTTTTAAATCTCCC ATCTTCCACAT AAACTCCTCCTTCAATCTTCCCCCACTATTTTTCAGTAAAATCATTGAACTTATTCTTCACTTGCGCCTCTC 
CCTAAAAATTT ACACCGTACAACGTCTATTTCACCACCAACTTACAACCCGCTCAT AAAAACTCATTTTACT AACT TCAAT AACAACTCAACCCGCACAC 



1010 



TTTTTTTTTTTTTTTTTTT 
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Fig. 13. 



JUmi 
l Stmt 
Saol i . BwnHl 

AaTCAATTCCAOCTCOttTiLCCCCQflCATCCCATTCCCAmCTCCCTirrCCrTTCCttr^ATmCATA^ ^ 

TC*eTTA JM !fTf*^ rATttg ^ ffgrACMTA ^ aTiU ^^ 



Ncol 

rC TCACCCAAATWTATACTACACTCT*TC^CC«^ 



A«A£TCCCTTTACCATATCATCTCACATACTCCCrATKAA^ 

n vrtTfSCt«rPCA^»tT«sotT$rHcc««Ts*Ck*r 

CCTryT n ^ 74ACCA ^ TC TTTCCTCCttAA^TCrrT8CT6qAAACrCCTC^ ^ 
tL*«CLFPtKtFAC«SS'»CS0SS«i.T»SASCKtLWPOOOI 

TS^yTCmmTTCAACATATCAATTAOUUKCACTCCCACA^ ^ 
^^y^^^^p ^A j ><*A*<|»*A ft ttcti tac T r AA TPT TTCCTCA^Cft fCTC AAA < CCTCCT TACC6TCC A AC A ACC AC TACCTCTCTC ACAAC ACTACC fTCTACT ATTCTT ACAACTCC TCCT 

0CSS$St ^OLCrTC T» l.Ct40»tC0ACSL»HC0OKll»CCO 

1MB 

TgjU^TAjUUUUA^AATCCCTTKATT^ 
ACmArrTTTTTCTCACCCAAC«TAA«T«TeT^ 

C V K K C S ¥ P L M C T I S I «K"SCS«fftSI PP*CSCO« l TO I OPS 

CTTOCCAQQTTTCCCTCA^ATCrrCACTACCCATATTCACA^ 1 '** ^ 

^^^i^r^^^^^^i^TA^rTeATCCrTArAAflTOTATttrTTTgCCAX 

LACrtOMLOtll Y S 0 T « • t « C C I 0 < T C C C LOAFS* C F C K F C 

irTCTTj^CCAttTCAAACACCAATAAC^ 

AAAOAATOCCTCACTTTCTCCrTArrCAATATCCmACCCOTCCACCTCW 

f l ft 9 c t c i t t.ftcWAPCA T V A A L I 8 0 F » • M H M I 0 M T I « C 

QrrTCCTCTCTCCCAftA TTTTTTTCCCAAATAA^CCACATCCTTCACCAX ^ 
c ^ y» f » C *fl^»ff TfTfl l» ^ I* * * at QqTTTA I W flTf TAf^AAATCgraTTA^CACTAgCAAflA^TgATTTCTATCrCT^ 

FCfWC I F L P W « A 0 4 S P P « P M C S ft V « IHBOTPICI * 0 S t PA 

TTttOATCAA^m TCAftTrCACCCACCTCftTCAJU^ tQg0 
AACCTAttTTCAAOACTCAACTCCCTCCACCA^ 

WIKr$¥OAPCCIPVMAtVtOPPKCCK»»FlCMPOP*«P«*». 



t200 



T ^ M l f^AT C ** TCTC A TCTTCXttATWTWTATC<ACCCAATAATTAACACATArCCCAACTTTA^ 
ATCCTAAATACTTACACTACAACCCTACTCATCATA^ 

HITC5MVCHJSWCPI INTyAMr«OORLP«IK«LCtMA¥OI 

Kpnl 

CATCOCTATTCA ACACCATTCCTATTATCCTAflTTTTCCftTACCATQTCACAAACmrrTC^ |JJ0 
CTACCQATAACTTCTCCTAA^TAATACCATtAAAA^ 

HAIOCMSTVASr 0 T M»T«rrAPSI«r«TPOOL«Sl.lO«AM 

T CAftlTAaOaCTttCTTCTTeTCATCCATATTftTTtATA^ |M0 
ACTCAATCCC«ACOAACAACA«TACCTATAA«AAOTATCOS*ACOCAttTTTATTAT«AAeCTACC^ 



C L 0 L L 



»LnOt*MSMAtMMTtOCL««FOCTO»»»rM«ai«a 
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Rg,1 3 (Cont>. 



rc*Tc*momTaocACTCTCOccTTTTc**cuf •tAWTOWTtcTJLMATTT C t rtTTrc^cciubflAWc<rmAMA<TAcwf i ry i ninm iimi 

. . No»t • ■ An I .. 

TttOMTOACT TCCATCAT0TACACTeetCAm«mcmTAflC^ 
ACCCCACTttAAttaTACTACAmCAOCCCTACCCAACCTCCATC^ 

ixTATjrTiJii raeMnrr'm'r *^^*"^ 

QgCCATTqCCCAT AAATcittmAAATTt m 

C CCQT AACOCCT ATTT ACC T AJtf TCTAAftAAITC TTCTCiyrACTCCTttACCTTTTACCCiCTttT AACACCTATCTAAATgtt 17 6 1 L * T 1 1 ACt AACCTTTTT if iC AACAAI TACSACT 
■'-■A' I -A O • K .'. V t X I C . I ; K /« 0.,C ^-O'-fw ■ * • 0 I » H T L t » ft • W t C « C A" » A C 
AitTt ATttMf E * SrTCTTttTTCgmC 1 « » ACTATTtATTTTCCCTCATCC fcCI A« I C tTCT ACCACTTCArQCCTCttmC AC AfT i TTTAC 1 Ct 1 1 1 1 ATAAATCCTttAAr AOC 

; I M 0 0 A t W A 0 I T 1 A r W C H P K O v* T 0 r H A * © * -P t f P L I O ■ 6 I A 
ATTCttACAAAATOATCACCCTTATTACCATO \ 

TAAttTgrm^^gg^TAATtOTACttflAATcca^ iuu mu i rm 

QCCCAATkTAAAATAAm 

iipj M iiti T nriTTu^^ 

. QtAk ^TCmAAiA^^ 

CftTTOTAaAA C t TCmca ATACCAAAgTACTAAAgACT^^ 

— V-'m V:/t c '"'a ¥ « > ^TV > »^ r «^M'- o-T* t a » V:V: t t;-0; «; i : i » r c *- « • .c t r t r V r 

TCATmACTAACAacTmCAAAnAC<»A ftl TCCCTQCTTCAAATCACaAAAATACAACAmT^^ 
ACTAACCTWTtTCAATAAATtTAATAQCKM^ 



caA0CAmr%rmTfttfccttiWATOTAA4 xww i CA0 Tccm 

QClXATAAAATAflAAAC reCCCACCAT ACTAn ttOCCAflAACt *0 fi I A A T ACCAT ATACATQflTAAA F CC I A 1 CA TCACCAAATACAAAATCATCTTCT ACTTCTCTTAC 1 1 ttt 1 1 1 1 f T 



WAttTfAAAlTBAABTH" ^^jfHtCQCCTAACATAAATATTTACT 1 1 t lAHTttCt TAAAACAAttAATCttTTAACCTATCCATCTACATTAA 



ACTTCATC7TTCACTTCAC 1 1 I C AT CAA AMCC>ACTCTATCTATAAATC*TTCTCCTA<mAI f 1 1 A T tl TT ACCAATTCCACACCT ACACCT AA C ne t TC CA TAT AACTCTCACCT 

AATCUTATlACTAATAqATCCmAAAmitAACCTt r i W irC 
TT AAAf ATACTAA TCATCT AQAAAATCTCAttCrAAACATCCATAC 



SUBSTITUTE SHEET (RULE 28) 



